US 11,948,075 B2
	Generating discrete latent representations of input data items
Koray Kavukcuoglu, London (GB); Aaron Gerard Antonius van den Oord, London (GB); and Oriol Vinyals, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Appl. No. 16/620,815
Filed by DEEPMIND TECHNOLOGIES LIMITED, London (GB)
PCT Filed Jun. 11, 2018, PCT No. PCT/EP2018/065308 § 371(c)(1), (2) Date Dec. 9, 2019, PCT Pub. No. WO2018/224690, PCT Pub. Date Dec. 13, 2018.
Claims priority of provisional application 62/517,824, filed on Jun. 9, 2017.
Prior Publication US 2020/0184316 A1, Jun. 11, 2020
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)]

20 Claims

1. A method of training an encoder neural network and a decoder neural network and of updating a set of latent embedding vectors stored in a memory, wherein:

the encoder neural network is configured to receive an input data item and process the input data item in accordance with a set of encoder network parameters to: generate an encoder output that comprises, for each of one more latent variables, a respective encoded vector;

the decoder neural network is configured to: receive a decoder input derived from a discrete latent representation of the input data item that is generated from the encoded vectors and the set of latent embedding vectors and process the decoder input in accordance with a set of decoder network parameters to: generate a reconstruction of the input data item, and the method comprises:

receiving a training data item;

processing the training data item through the encoder neural network in accordance with current values of the encoder network parameters of the encoder neural network to generate a training encoder output that comprises, for each of the one more latent variables, a respective training encoded vector;

selecting, for each latent variable and from a plurality of current latent embedding vectors currently stored in the memory, a current latent embedding vector that is nearest to the training encoded vector for the latent variable;

generating a training decoder input that includes the nearest current latent embedding vectors;

processing the training decoder input through the decoder neural network in accordance with current values of the decoder network parameters of the decoder neural network to generate a training reconstruction of the training data item;

determining a reconstruction update to the current values of the decoder network parameters and the encoder network parameters by determining a gradient with respect to the current values of the decoder network parameters and the encoder network parameters to optimize a reconstruction error between the training reconstruction and the training data item; and

for each latent variable, determining an update to the nearest current latent embedding vector for the latent variable by determining a gradient with respect to the nearest current latent embedding vector to minimize an error between the training encoded vector for the latent variable and the nearest current latent embedding vector to the training encoded vector for the latent variable.