US 12,443,851 B2
	Augmenting attention-based neural networks to selectively attend to past inputs
Jack William Rae, London (GB); Anna Potapenko, London (GB); and Timothy Paul Lillicrap, London (GB)
Assigned to GDM Holding LLC, Mountain View, CA (US)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Oct. 12, 2023, as Appl. No. 18/486,060.
Application 18/486,060 is a continuation of application No. 17/033,396, filed on Sep. 25, 2020, granted, now 11,829,884.
Claims priority of provisional application 62/906,078, filed on Sep. 25, 2019.
Prior Publication US 2024/0046103 A1, Feb. 8, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/04 (2023.01); G06F 18/214 (2023.01); G06N 3/047 (2023.01); G06N 3/08 (2023.01); G06N 3/084 (2023.01)

CPC G06N 3/084 (2013.01) [G06F 18/2148 (2023.01); G06N 3/047 (2023.01); G06N 3/08 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving a sequence of network inputs, wherein each network input is an input sequence that comprises a plurality of input positions;

processing the sequence of network inputs using a neural network to generate a network output, wherein the neural network comprises an attention layer associated with a short-term, episodic memory and a long-term, compressed memory, and wherein the processing comprises, for a particular network input in the sequence of network inputs:

updating the short-term, episodic memory to include new data, the new data comprising respective hidden states that were processed by the attention layer for a plurality of input positions in the particular network input;

determining a compression rate for a portion of existing data stored in the short-term, episodic memory, the portion of existing data comprising respective hidden states that were processed by the attention layer for a plurality of input positions in each of one or more previous network inputs that precede the particular network input in the sequence of network inputs;

compressing, in accordance with the compression rate, the portion of existing data to generate a compressed representation for the portion of existing data; and

updating the long-term, compressed memory to include the compressed representation for the portion of existing data.