| CPC G06N 3/084 (2013.01) [G06F 18/2148 (2023.01); G06N 3/047 (2023.01); G06N 3/08 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving a sequence of network inputs, wherein each network input is an input sequence that comprises a plurality of input positions;
processing the sequence of network inputs using a neural network to generate a network output, wherein the neural network comprises an attention layer associated with a short-term, episodic memory and a long-term, compressed memory, and wherein the processing comprises, for a particular network input in the sequence of network inputs:
updating the short-term, episodic memory to include new data, the new data comprising respective hidden states that were processed by the attention layer for a plurality of input positions in the particular network input;
determining a compression rate for a portion of existing data stored in the short-term, episodic memory, the portion of existing data comprising respective hidden states that were processed by the attention layer for a plurality of input positions in each of one or more previous network inputs that precede the particular network input in the sequence of network inputs;
compressing, in accordance with the compression rate, the portion of existing data to generate a compressed representation for the portion of existing data; and
updating the long-term, compressed memory to include the compressed representation for the portion of existing data.
|