US 12,106,064 B2
Generating neural network outputs using insertion operations
Jakob D. Uszkoreit, Berlin (DE); Mitchell Thomas Stern, Berkeley, CA (US); Jamie Ryan Kiros, Toronto (CA); and William Chan, Toronto (CA)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 15, 2022, as Appl. No. 18/082,357.
Application 18/082,357 is a continuation of application No. 16/988,551, filed on Aug. 7, 2020, granted, now 11,556,721.
Application 16/988,551 is a continuation of application No. 16/751,167, filed on Jan. 23, 2020, granted, now 10,740,571, issued on Aug. 11, 2020.
Claims priority of provisional application 62/815,908, filed on Mar. 8, 2019.
Claims priority of provisional application 62/796,038, filed on Jan. 23, 2019.
Prior Publication US 2023/0120410 A1, Apr. 20, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G06F 40/44 (2020.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01); G06N 5/04 (2023.01)
CPC G06F 40/44 (2020.01) [G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06N 5/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by one or more computers, the method comprising:
receiving a network input; and
generating a network output that represents an output image from the network input, wherein the network output comprises a plurality of outputs from a vocabulary of outputs arranged according to an output order, each of the plurality of outputs corresponding to a respective location in the output image, the generating comprising, at each of a plurality of generation time steps:
identifying a current partial network output that has already been generated as of the generation time step;
generating, using a decoder neural network conditioned on (i) at least a portion of the network input and (ii) the current partial network output, a decoder output that defines, for each of a plurality of insertion locations, a respective score distribution over the vocabulary of outputs, wherein each insertion location is a different new location in the output image at which there is no output in the current partial network output;
selecting, using the decoder output, one or more of the insertion locations and, for each selected insertion location, an inserted output from the vocabulary; and
generating a new partial network output that comprises (i) the current partial network output and (ii) for each selected insertion location, the inserted output from the vocabulary inserted at the corresponding new location in the output image.