US 12,353,976 B2
Gated attention neural networks
Emilio Parisotto, London (GB); Hasuk Song, London (GB); Jack William Rae, London (GB); Siddhant Madhu Jayakumar, London (GB); Maxwell Elliot Jaderberg, London (GB); Razvan Pascanu, Letchworth Garden City (GB); and Caglar Gulcehre, Lausanne (CH)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on May 30, 2024, as Appl. No. 18/679,200.
Application 18/679,200 is a continuation of application No. 17/763,984, granted, now 12,033,055, previously published as PCT/EP2020/074913, filed on Sep. 7, 2020.
Claims priority of provisional application 62/906,032, filed on Sep. 25, 2019.
Prior Publication US 2024/0320469 A1, Sep. 26, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/044 (2023.01); G06N 3/048 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/044 (2023.01) [G06N 3/048 (2023.01); G06N 3/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement an attention neural network that is configured to receive a network input and to process the network input to generate an output, the attention neural network comprising:
an attention block configured to receive a query input, a key input, and a value input that are derived from an attention block input, the attention block comprising:
an attention neural network layer configured to:
receive an attention layer input derived from the query input, the key input, and the value input, and
apply an attention mechanism to the attention layer input to generate an attention layer output for the attention neural network layer; and
a gating neural network layer configured to apply a gating mechanism to the attention block input and the attention layer output of the attention neural network layer to generate a gated attention output.