US 12,225,239 B2
	High-fidelity generative image compression
George Dan Toderici, Mountain View, CA (US); Fabian Julius Mentzer, Zürich (CH); Eirikur Thor Agustsson, Zürich (CH); and Michael Tobias Tschannen, Zürich (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Aug. 25, 2023, as Appl. No. 18/238,068.
Application 18/238,068 is a continuation of application No. 17/107,684, filed on Nov. 30, 2020, granted, now 11,750,848.
Prior Publication US 2024/0107079 A1, Mar. 28, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 10/00 (2022.01); G06N 3/045 (2023.01); G06N 3/088 (2023.01); H04N 19/124 (2014.01); H04N 19/154 (2014.01); H04N 19/91 (2014.01)

CPC H04N 19/91 (2014.11) [G06N 3/045 (2023.01); G06N 3/088 (2013.01); H04N 19/124 (2014.11); H04N 19/154 (2014.11)]

20 Claims

1. A method performed by one or more computers for training an encoder neural network configured to receive a data item and to process the data item in accordance with current values of a plurality of encoder network parameters to output a compressed representation of the data item, wherein the training comprises, receiving a plurality of training data items, and, for each training data item:

processing the training data item using the encoder neural network to generate a latent representation of the training data item;

generating a compressed representation of the training data item from the latent representation of the training data item;

processing the compressed representation using a decoder neural network to generate a reconstruction of the training data item;

processing the reconstruction of the training data item using a discriminator neural network to generate a discriminator network output that specifies a discriminator's classification of the reconstruction of the training data item;

evaluating a first loss function that depends on (ii) a reconstruction term measuring a quality of the reconstruction, and (iii) a discriminator term measuring a difference between the discriminator's classification of the reconstruction of the training data item and a ground truth classification of the training data item; and

determining an update to the current values of the encoder network parameters based on determining a gradient with respect to the encoder network parameters of the first loss function.