CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)] | 21 Claims |
1. A method of controlling an agent interacting with an environment, the method comprising, at each of a plurality of time steps:
receiving a current observation characterizing a current state of the environment as of the time step;
generating an embedding of the current observation;
processing scene memory data comprising (i) embeddings of prior observations received at prior time steps and (ii) the embedding of the current observation using a self-attention neural network, wherein the self-attention neural network is configured to apply a self-attention mechanism to the scene memory data to generate an output that comprises an updated embedding of the current observation, and wherein the self-attention neural network comprises an attention block that is configured to, for each of one or more attention heads:
generate a set of queries that comprises a respective query for each of the embeddings of prior observations and for the embedding of the current observation;
generate a set of values that comprises a respective value for each of the embeddings of prior observations and for the embedding of the current observation;
generate a set of keys that comprises a respective key for each of the embeddings of prior observations and for the embedding of the current observation; and
apply a self-attention attention mechanism that uses the set of queries as a query input, the set of values as a value input, and the set of keys as a key input to generate an attention output;
selecting, based at least on the updated embedding of the current observation, an action from a set of possible actions to be performed by the agent in response to the current observation; and
causing the agent to perform the selected action.
|