US 12,277,672 B2
	Generative adversarial networks with temporal and spatial discriminators for efficient video generation
Aidan Clark, London (GB); Jeffrey Donahue, London (GB); and Karen Simonyan, London (GB)
Assigned to DeepMind Technologies Limited, London (GB)
Appl. No. 17/613,694
Filed by DeepMind Technologies Limited, London (GB)
PCT Filed May 22, 2020, PCT No. PCT/EP2020/064270 § 371(c)(1), (2) Date Nov. 23, 2021, PCT Pub. No. WO2020/234449, PCT Pub. Date Nov. 26, 2020.
Claims priority of provisional application 62/852,176, filed on May 23, 2019.
Prior Publication US 2022/0230276 A1, Jul. 21, 2022
Int. Cl. G06T 3/4046 (2024.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)

CPC G06T 3/4046 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06T 2207/20081 (2013.01)]

13 Claims

1. A computer-implemented method for training a discriminator network for use in training a generator to generate a sequence of images representing a temporal progression, the discriminator network being for distinguishing between sequences of images generated by the generator network and sequences of images which are not generated by the generator network, the discriminator network comprising a temporal discriminator network for discriminating based on temporal features and a spatial discriminator network for discriminating based on spatial features, the temporal discriminator network and the spatial discriminator network each comprising a multi-layer network of neurons in which each layer performs a function defined by corresponding weights, the method comprising:

receiving an input sequence of images representing a temporal progression;

forming, from the input sequence, a first set of images having a lower temporal resolution than the input sequence, and inputting the first set into the spatial discriminator network to determine, based on the spatial features of each image in the first set, a first discriminator score representing a probability that the input sequence has been generated by the generator network,

wherein the first set comprises more than one image and determining the first discriminator score comprises: determining, for each image in the first set, a corresponding discriminator value representing a probability that the image was generated by the generator network, and combining the discriminator values for the images in the first set to produce the first discriminator score, and

wherein each discriminator value is determined based on only a single corresponding image from the first set;

forming, from the input sequence, a second set of images having a lower spatial resolution than the input sequence, and inputting the second set into the temporal discriminator network to determine, based on the temporal features of the images in the second set, a second discriminator score representing a probability that the input sequence has been generated by the generator network; and

varying weights of the discriminator network based on the first discriminator score and the second discriminator score,

wherein the first set and the second set include images that span the same time interval during which the input sequence of images are taken, wherein the first set has fewer images than the second set, and wherein each image in the second set has a lower spatial resolution than the images in the first set.