| CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)] | 19 Claims |

|
1. A system comprising:
a user computer; and
a computer system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
receiving, from the user computer, input data specifying an input sequence comprising a plurality of input tokens of a natural language;
at each of a plurality of generation time steps:
generating a combined sequence for the generation time step that includes the input sequence followed by output tokens that have already been generated as of the generation time step;
processing the combined sequence using a self-attention decoder neural network that comprises a plurality of masked self-attention neural network layers, and wherein the self-attention decoder neural network is configured to process the combined sequence to generate a time step output; and
determining a respective output token using the time step output; and
providing, to the user computer, output data specifying an output sequence comprising the output tokens determined for the plurality of generation time steps.
|