US 11,875,258 B1
Selecting reinforcement learning actions using a low-level controller
Nicolas Manfred Otto Heess, London (GB); Timothy Paul Lillicrap, London (GB); Gregory Duncan Wayne, London (GB); and Yuval Tassa, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Dec. 2, 2021, as Appl. No. 17/541,186.
Application 17/541,186 is a continuation of application No. 15/594,228, filed on May 12, 2017, granted, now 11,210,585.
Claims priority of provisional application 62/339,789, filed on May 20, 2016.
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06N 3/006 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)] 21 Claims
OG exemplary drawing
 
1. A system for selecting actions to be performed by an agent interacting with an environment, the system comprising:
a high-level controller neural network, wherein the high-level controller neural network is configured to:
receive an input observation characterizing a state of the environment, and
process the input observation to generate a high-level output that parameterizes a distribution over possible control signals from which a control signal is selected to modulate behavior of a low-level controller neural network;
the low-level controller neural network, wherein the low-level controller neural network is configured to:
receive a designated component of the input observation, and
process (i) the designated component of the input observation and (ii) an input control signal generated by the high-level controller neural network to generate a low-level output that parameterizes an action distribution over possible actions from which an action is selected to be performed by the agent to control movement of the agent in response to the input observation; and
a subsystem, wherein the subsystem is configured to:
receive a current observation characterizing a current state of the environment;
determine whether criteria are satisfied for generating a new control signal;
when the criteria are satisfied:
provide the current observation as input to the high-level controller neural network to determine a new control signal, and
provide the designated component of the current observation and the new control signal as input to the low-level controller neural network to select an action to be performed by the agent in response to the current observation; and
when the criteria are not satisfied:
provide the designated component of the current observation and an existing control signal defined by a high-level output previously generated by the high-level controller neural network as input to the low-level controller neural network to select an action to be performed by the agent in response to the current observation.