CPC G06N 3/08 (2013.01) [G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)] | 21 Claims |
1. A system for selecting actions to be performed by an agent interacting with an environment, the system comprising:
a high-level controller neural network, wherein the high-level controller neural network is configured to:
receive an input observation characterizing a state of the environment, and
process the input observation to generate a high-level output that parameterizes a distribution over possible control signals from which a control signal is selected to modulate behavior of a low-level controller neural network;
the low-level controller neural network, wherein the low-level controller neural network is configured to:
receive a designated component of the input observation, and
process (i) the designated component of the input observation and (ii) an input control signal generated by the high-level controller neural network to generate a low-level output that parameterizes an action distribution over possible actions from which an action is selected to be performed by the agent to control movement of the agent in response to the input observation; and
a subsystem, wherein the subsystem is configured to:
receive a current observation characterizing a current state of the environment;
determine whether criteria are satisfied for generating a new control signal;
when the criteria are satisfied:
provide the current observation as input to the high-level controller neural network to determine a new control signal, and
provide the designated component of the current observation and the new control signal as input to the low-level controller neural network to select an action to be performed by the agent in response to the current observation; and
when the criteria are not satisfied:
provide the designated component of the current observation and an existing control signal defined by a high-level output previously generated by the high-level controller neural network as input to the low-level controller neural network to select an action to be performed by the agent in response to the current observation.
|