US 12,033,055 B2
Gated attention neural networks
Emilio Parisotto, London (GB); Hasuk Song, London (GB); Jack William Rae, London (GB); Siddhant Madhu Jayakumar, London (GB); Maxwell Elliot Jaderberg, London (GB); Razvan Pascanu, London (GB); and Caglar Gulcehre, Lo (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Appl. No. 17/763,984
Filed by DeepMind Technologies Limited, London (GB)
PCT Filed Sep. 7, 2020, PCT No. PCT/EP2020/074913
§ 371(c)(1), (2) Date Mar. 25, 2022,
PCT Pub. No. WO2021/058270, PCT Pub. Date Apr. 1, 2021.
Claims priority of provisional application 62/906,032, filed on Sep. 25, 2019.
Prior Publication US 2022/0366218 A1, Nov. 17, 2022
Int. Cl. G06N 3/044 (2023.01); G06N 3/048 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/044 (2023.01) [G06N 3/048 (2023.01); G06N 3/08 (2013.01)] 10 Claims
OG exemplary drawing
 
1. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement an attention neural network that is configured to receive an input sequence and to process the input sequence to generate an output, the attention neural network comprising:
an attention block configured to receive a query input, a key input, and a value input that are derived from an attention block input, the attention block comprising:
an attention neural network layer configured to:
receive an attention layer input derived from the query input, the key input, and the value input, and
apply an attention mechanism to the attention layer input to generate an attention layer output for the attention neural network layer; and
a gating neural network layer configured to apply a gating mechanism to the attention block input and the attention layer output of the attention neural network layer to generate a gated attention output.