| CPC G06N 3/08 (2013.01) [G06N 7/01 (2023.01)] | 20 Claims |

|
1. A method of selecting, using a neural network, actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result,
wherein the neural network has a plurality of network parameters and is configured to receive an input observation characterizing a state of the environment and to process the input observation in accordance with the network parameters to generate a network output that comprises an action selection output that defines an action selection policy for selecting an action to be performed by the agent in response to the input observation, and
wherein the method comprises:
receiving a current observation characterizing a current state of the environment;
determining a target action selection output for the current observation by performing, using the neural network and in accordance with current values of the network parameters, a look ahead search of possible future states of the environment starting from the current state until the environment reaches a possible future state that satisfies one or more termination criteria, wherein the look ahead search is a tree search of a state tree having nodes representing states of the environment starting from a root node that represents the current state, and wherein performing the look ahead search comprises adding noise to prior probabilities for the root node that are used to traverse from the root node to other nodes in the state tree; and
selecting an action to be performed by the agent in response to the current observation using the target action selection output generated by performing the look ahead search.
|