US 12,382,068 B2
High-performance and low-complexity neural compression from a single image, video or audio data
Emilien Dupont, London (GB); Hyun Jik Kim, London (GB); Matthias Stephan Bauer, London (GB); and Lucas Marvin Theis, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Nov. 15, 2024, as Appl. No. 18/949,892.
Claims priority of provisional application 63/600,412, filed on Nov. 17, 2023.
Prior Publication US 2025/0168368 A1, May 22, 2025
Int. Cl. G06K 9/62 (2022.01); H04N 19/119 (2014.01); H04N 19/124 (2014.01); H04N 19/167 (2014.01); H04N 19/189 (2014.01); H04N 19/30 (2014.01)
CPC H04N 19/189 (2014.11) [H04N 19/119 (2014.11); H04N 19/124 (2014.11); H04N 19/167 (2014.11); H04N 19/30 (2014.11)] 30 Claims
OG exemplary drawing
 
1. A method of encoding input data performed by one or more data processing apparatus, the input data comprising input data values corresponding to respective input data grid points of an input data grid, the method comprising:
optimizing an objective function by jointly optimizing parameters of a synthesis neural network, parameters of a decoder neural network, and a set of respective latent values, each respective latent value corresponding to a respective latent grid point of a respective one of a plurality of latent grids having different respective resolutions, and wherein the optimizing comprises, at each of a plurality of optimization iterations:
generating a respective updated value for each of the respective latent values, comprising:
generating a respective noise distribution for the optimization iteration, wherein the respective noise distribution for the optimization iteration has a shape that is more uniform than the respective noise distribution for a preceding optimization iteration;
sampling a respective noise value for each respective latent value from the respective noise distribution for the optimization iteration;
updating each of the respective latent values by adding the respective noise value for the respective latent value to the respective latent value;
processing the respective updated values using the synthesis neural network to generate respective reconstructed data values, each corresponding to a respective input data value of the input data values;
determining a respective probability distribution for each respective updated value using the decoder neural network;
evaluating the objective function by determining (i) an error between the input data values and the respective reconstructed data values and (ii) a compressibility term based on the respective probability distributions;
determining gradients of the objective function with respect to the parameters of the synthesis neural network, the parameters of the decoder neural network, and the set of respective latent values; and
using the gradients to update one or more of: the parameters of the synthesis neural network, the parameters of the decoder neural network, and the respective latent values;
quantizing the optimized respective latent values; and
encoding the quantized respective latent values using a probability distribution for the respective latent values, the probability distribution being defined by the decoder neural network.