| CPC G06N 3/08 (2013.01) [G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)] | 20 Claims |

|
1. A system for selecting actions to be performed by an agent interacting with an environment, the system comprising:
a high-level controller neural network, wherein the high-level controller neural network is configured to:
receive an input observation characterizing a state of the environment, and
process the input observation to generate a high-level output defining a control signal for a low-level controller neural network;
the low-level controller neural network, wherein the low-level controller neural network is configured to:
receive a designated component of the input observation, and
process (i) the designated component of the input observation and (ii) an input control signal generated by the high-level controller neural network to generate a low-level output that defines an action to be performed by the agent to control movement of the agent in response to the input observation; and
a subsystem, wherein the subsystem is configured to:
receive a current observation characterizing a current state of the environment;
determine whether a threshold number of observations have been received since an existing control signal was generated by the high-level controller neural network, wherein the threshold number is greater than one;
in response to a determination that the threshold number of observations have been received:
provide the current observation as input to the high-level controller neural network to determine a new control signal; and
provide the designated component of the current observation and the new control signal as input to the low-level controller neural network to select an action to be performed by the agent in response to the current observation,
wherein the high-level controller neural network is configured to generate the high-level output at a high-level control rate and the low-level controller neural network is configured to generate the low-level output at a low-level control rate, and wherein in the high-level control rate is slower than the low-level control rate.
|