US 12,248,875 B2
	Controlling agents using scene memory data
Kuan Fang, Stanford, CA (US); and Alexander Toshkov Toshev, San Francisco, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 11, 2023, as Appl. No. 18/536,074.
Application 18/536,074 is a continuation of application No. 17/953,222, filed on Sep. 26, 2022, granted, now 11,842,277.
Application 17/953,222 is a continuation of application No. 16/602,702, filed on Nov. 20, 2019, granted, now 11,455,530, issued on Sep. 27, 2022.
Claims priority of provisional application 62/770,114, filed on Nov. 20, 2018.
Prior Publication US 2024/0220799 A1, Jul. 4, 2024
Int. Cl. G06N 3/08 (2023.01); G06N 3/045 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)]

21 Claims

1. A method of controlling an agent interacting with an environment, the method comprising, at each of a plurality of time steps:

receiving a current observation characterizing a current state of the environment as of the time step;

generating an embedding of the current observation;

processing scene memory data comprising (i) embeddings of prior observations received at prior time steps and (ii) the embedding of the current observation using a self-attention neural network, wherein the self-attention neural network is configured to apply a self-attention mechanism to the scene memory data to generate an output that comprises an updated embedding of the current observation, and wherein the self-attention neural network comprises an attention block that is configured to, for each of one or more attention heads:

generate a set of queries that comprises a respective query for each of the embeddings of prior observations and for the embedding of the current observation;

generate a set of values that comprises a respective value for each of the embeddings of prior observations and for the embedding of the current observation;

generate a set of keys that comprises a respective key for each of the embeddings of prior observations and for the embedding of the current observation; and

apply a self-attention attention mechanism that uses the set of queries as a query input, the set of values as a value input, and the set of keys as a key input to generate an attention output;

selecting, based at least on the updated embedding of the current observation, an action from a set of possible actions to be performed by the agent in response to the current observation; and

causing the agent to perform the selected action.