CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] | 20 Claims |
1. A method of training an encoder neural network and a decoder neural network and of updating a set of latent embedding vectors stored in a memory, wherein:
the encoder neural network is configured to receive an input data item and process the input data item in accordance with a set of encoder network parameters to: generate an encoder output that comprises, for each of one more latent variables, a respective encoded vector;
the decoder neural network is configured to: receive a decoder input derived from a discrete latent representation of the input data item that is generated from the encoded vectors and the set of latent embedding vectors and process the decoder input in accordance with a set of decoder network parameters to: generate a reconstruction of the input data item, and the method comprises:
receiving a training data item;
processing the training data item through the encoder neural network in accordance with current values of the encoder network parameters of the encoder neural network to generate a training encoder output that comprises, for each of the one more latent variables, a respective training encoded vector;
selecting, for each latent variable and from a plurality of current latent embedding vectors currently stored in the memory, a current latent embedding vector that is nearest to the training encoded vector for the latent variable;
generating a training decoder input that includes the nearest current latent embedding vectors;
processing the training decoder input through the decoder neural network in accordance with current values of the decoder network parameters of the decoder neural network to generate a training reconstruction of the training data item;
determining a reconstruction update to the current values of the decoder network parameters and the encoder network parameters by determining a gradient with respect to the current values of the decoder network parameters and the encoder network parameters to optimize a reconstruction error between the training reconstruction and the training data item; and
for each latent variable, determining an update to the nearest current latent embedding vector for the latent variable by determining a gradient with respect to the nearest current latent embedding vector to minimize an error between the training encoded vector for the latent variable and the nearest current latent embedding vector to the training encoded vector for the latent variable.
|