US 12,353,991 B2
	Fast decoding in sequence models using discrete latent variables
Lukasz Mieczyslaw Kaiser, Mountain View, CA (US); Aurko Roy, San Francisco, CA (US); Ashish Teku Vaswani, San Francisco, CA (US); Niki Parmar, Sunnyvale, CA (US); Samuel Bengio, Los Altos, CA (US); Jakob D. Uszkoreit, Portola Valley, CA (US); and Noam M. Shazeer, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Appl. No. 16/968,505
Filed by GOOGLE LLC, Mountain View, CA (US)
PCT Filed Feb. 11, 2019, PCT No. PCT/US2019/017534 § 371(c)(1), (2) Date Aug. 7, 2020, PCT Pub. No. WO2019/157462, PCT Pub. Date Aug. 15, 2019.
Claims priority of provisional application 62/628,912, filed on Feb. 9, 2018.
Prior Publication US 2020/0410344 A1, Dec. 31, 2020
Int. Cl. G06N 3/08 (2023.01)

CPC G06N 3/08 (2013.01)

20 Claims

1. A method of generating an output sequence comprising a plurality of outputs from an input sequence comprising a plurality of inputs, the method comprising:

receiving the input sequence;

processing the input sequence using a latent prediction model configured to autoregressively predict a sequence of discrete latent variables that is shorter than the output sequence, wherein each discrete latent variable in the sequence of discrete latent variables is selected from a discrete set of latent variables, and wherein the latent prediction model is configured to, for each discrete latent variable in the sequence:

select the discrete latent variable from the discrete set of latent variables conditioned on the input sequence and on any discrete latent variables in the sequence that have already been generated; and

processing the input sequence and the predicted sequence of discrete latent variables using a parallel decoder model configured to generate the outputs in the output sequence in parallel from the input sequence and the predicted sequence of discrete latent variables, comprising:

processing, using a first deep neural network within the parallel decoder model, the input sequence and the predicted sequence of discrete latent variables that is generated by the latent prediction model and that is shorter than the output sequence to generate a first sequence that has a same length as the output sequence; and

processing, using a decoder deep neural network, the first sequence that is generated from the input sequence and the predicted sequence of discrete latent variables to generate the output sequence in parallel from the first sequence in a single forward pass.