US 12,367,391 B1
	Selecting reinforcement learning actions using a low-level controller
Nicolas Manfred Otto Heess, London (GB); Timothy Paul Lillicrap, London (GB); Gregory Duncan Wayne, London (GB); and Yuval Tassa, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Dec. 27, 2023, as Appl. No. 18/398,046.
Application 18/398,046 is a continuation of application No. 17/541,186, filed on Dec. 2, 2021, granted, now 11,875,258.
Application 17/541,186 is a continuation of application No. 15/594,228, filed on May 12, 2017, granted, now 11,210,585, issued on Dec. 28, 2021.
Claims priority of provisional application 62/339,789, filed on May 20, 2016.
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06N 3/006 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)]

20 Claims

1. A system for selecting actions to be performed by an agent interacting with an environment, the system comprising:

a high-level controller neural network, wherein the high-level controller neural network is configured to:

receive an input observation characterizing a state of the environment, and

process the input observation to generate a high-level output defining a control signal for a low-level controller neural network;

the low-level controller neural network, wherein the low-level controller neural network is configured to:

receive a designated component of the input observation, and

process (i) the designated component of the input observation and (ii) an input control signal generated by the high-level controller neural network to generate a low-level output that defines an action to be performed by the agent to control movement of the agent in response to the input observation; and

a subsystem, wherein the subsystem is configured to:

receive a current observation characterizing a current state of the environment;

determine whether a threshold number of observations have been received since an existing control signal was generated by the high-level controller neural network, wherein the threshold number is greater than one;

in response to a determination that the threshold number of observations have been received:

provide the current observation as input to the high-level controller neural network to determine a new control signal; and

provide the designated component of the current observation and the new control signal as input to the low-level controller neural network to select an action to be performed by the agent in response to the current observation,

wherein the high-level controller neural network is configured to generate the high-level output at a high-level control rate and the low-level controller neural network is configured to generate the low-level output at a low-level control rate, and wherein in the high-level control rate is slower than the low-level control rate.