CPC H04N 19/126 (2014.11) [G06N 3/0454 (2013.01); G06N 3/084 (2013.01); G06V 10/774 (2022.01); H04N 19/13 (2014.11)] | 16 Claims |
1. A computer implemented method of training a first neural network and a second neural network, the neural networks being for use in lossy image or video compression, transmission and decoding, the method including the steps of:
(i) receiving an input training image;
(ii) encoding the input training image using the first neural network, to produce a latent representation;
(iii) quantizing the latent representation to produce a quantized latent;
(iv) using the second neural network to produce an output image from the quantized latent, wherein the output image is an approximation of the input image;
(v) evaluating a loss function based on differences between the output image and the input training image;
(vi) evaluating a gradient of the loss function;
(vii) back-propagating the gradient of the loss function through the second neural network and through the first neural network, to update weights of the second neural network and of the first neural network; and
(viii) repeating steps (i) to (vii) using a set of training images, to produce a trained first neural network and a trained second neural network, and
(ix) storing the weights of the trained first neural network and of the trained second neural network;
wherein the loss function is a weighted sum of a rate term and a distortion term,
wherein split quantisation is used during the evaluation of the gradient of the loss function, with a combination of two quantisation proxies for the rate term and the distortion term.
|