| CPC G06N 3/045 (2023.01) [G06N 3/08 (2013.01)] | 18 Claims |

|
1. A computer-implemented method of processing an input sequence in a transformer having an encoder and a decoder, the method comprising:
generating, by one or more processors of a processing system, a first tokenized input sequence based on the input sequence, the first tokenized input sequence comprising a plurality of tokens corresponding to a task;
for each given token of the plurality of tokens, by the one or more processors:
generating a first vector representing the given token;
at a first layer of the encoder:
routing, based on a learned gating function of a mixture-of-experts (MoE) sublayer of the first layer, the first vector to two or more expert feed-forward networks of a set of expert feed-forward networks of the MoE sublayer of the first layer; and
generating a second vector based on processing the first vector in the two or more expert feed-forward networks of the set of expert feed-forward networks of the MoE sublayer of the first layer; and
at a second layer of the encoder including a single feed-forward network sublayer, generating a third vector based on processing the second vector in the single feed-forward network sublayer;
generating, by the one or more processors, a combined encoder output vector corresponding to the task and based on each third vector for each given token of the plurality of tokens; and
for each given element of a plurality of elements in a target sequence vector, by the one or more processors:
generating a fourth vector based on the combined encoder output vector and a target sequence vector;
routing, based on a learned gating function of the decoder, the fourth vector to two or more expert feed-forward networks of a set of expert feed-forward networks of the decoder;
generating a fifth vector based on processing the fourth vector in the two or more expert feed-forward networks of the set of expert feed-forward networks of the decoder; and
modifying the given element of the target sequence vector based on the fifth vector.
|