| CPC G06F 18/2148 (2023.01) [G06F 17/11 (2013.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] | 20 Claims |

|
1. A method of training a neural network system, the method comprising:
training an action selection neural network to perform a task,
wherein the action selection neural network is configured to receive inputs comprising observations of an environment and to process the inputs to generate action selection outputs indicating actions to perform the task, and
during training of the action selection neural network:
training an auxiliary prediction task neural network, wherein the auxiliary prediction task neural network is configured to receive an intermediate output from the action selection neural network and to generate a prediction output which indicates a predicted characteristic of the task,
wherein training the auxiliary prediction task neural network comprises training the auxiliary prediction task neural network and the action selection neural network using demonstration data for the task by backpropagating gradients determined from an auxiliary learning loss function through the auxiliary prediction task neural network and into the action selection neural network to bring the predicted characteristic closer to a corresponding observed characteristic of the task from the demonstration data.
|