US 12,142,034 B2
	Attention-based image generation neural networks
Noam M. Shazeer, Palo Alto, CA (US); Lukasz Mieczyslaw Kaiser, San Francisco, CA (US); Jakob D. Uszkoreit, Berlin (DE); Niki J. Parmar, San Francisco, CA (US); and Ashish Teku Vaswani, San Francisco, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Nov. 8, 2023, as Appl. No. 18/388,178.
Application 18/388,178 is a continuation of application No. 17/867,242, filed on Jul. 18, 2022, granted, now 11,816,884.
Application 17/867,242 is a continuation of application No. 17/098,271, filed on Nov. 13, 2020, granted, now 11,392,790, issued on Jul. 19, 2022.
Application 17/098,271 is a continuation of application No. 16/174,074, filed on Oct. 29, 2018, granted, now 10,839,259, issued on Nov. 17, 2020.
Claims priority of provisional application 62/578,390, filed on Oct. 27, 2017.
Prior Publication US 2024/0193926 A1, Jun. 13, 2024
Int. Cl. G06V 10/82 (2022.01); G06F 18/21 (2023.01); G06F 18/213 (2023.01); G06F 18/28 (2023.01); G06N 3/04 (2023.01); G06N 3/084 (2023.01); G06T 3/4053 (2024.01); G06V 10/56 (2022.01); G06V 10/77 (2022.01)

CPC G06V 10/82 (2022.01) [G06F 18/213 (2023.01); G06F 18/217 (2023.01); G06F 18/28 (2023.01); G06N 3/04 (2013.01); G06N 3/084 (2013.01); G06T 3/4053 (2013.01); G06V 10/56 (2022.01); G06V 10/7715 (2022.01)]

20 Claims

1. A method of generating an output image, the output image comprising a plurality of pixels arranged in a two-dimensional map, each pixel having a respective value for each of a plurality of channels, and the method comprising:

receiving a conditioning input;

processing the conditioning input using an encoder neural network to generate a sequential conditioning representation that comprises a sequence of encoded representations;

generating the output image, comprising, at each of a plurality of generation time steps:

generating a current output image representation of the output image as of the generation time step; and

processing the current output image representation using a decoder neural network to update the output image, wherein the decoder neural network comprises a sequence of decoder subnetworks, one or more of the decoder subnetworks comprising a respective encoder-decoder attention sub-layer that is configured to:

receive a respective input for each of a plurality of positions that is derived from the current output image representation; and

generate a respective updated representation for each of the plurality of positions, comprising applying an attention mechanism over the encoded representations in the sequential conditioning representation using one or more queries derived from the respective inputs for each of the plurality of positions.