US 12,266,161 B2
Device and method for training a machine learning system for generating images
Anna Khoreva, Stuttgart (DE); and Vadim Sushko, Stuttgart (DE)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Jan. 31, 2022, as Appl. No. 17/649,481.
Claims priority of application No. 21157926 (EP), filed on Feb. 18, 2021.
Prior Publication US 2022/0262106 A1, Aug. 18, 2022
Int. Cl. G06V 10/82 (2022.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06V 10/764 (2022.01)
CPC G06V 10/82 (2022.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06V 10/764 (2022.01)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method for training a generative adversarial network, wherein a generator of the generative adversarial network is configured to generate at least one image based on at least one input value, the method comprising:
training the generative adversarial network by maximizing a loss function that characterizes a difference between a first image determined by the generator for at least one first input value and a third image that is in addition to a provided second image, the third image being determined by the generator for at least one second input value;
determining, by a discriminator of the generative adversarial network, a first output characterizing two classifications of the first image, and determining, by the discriminator, a second output characterizing two classifications of the provided second image, wherein the discriminator is configured to determine an output for a supplied image according to the following steps:
determining an intermediate representation of the supplied image;
determining a content representation of the supplied image by applying a global pooling operation to the intermediate representation;
determining a layout representation of the supplied image by applying a convolutional operation to the intermediate representation;
determining a content value characterizing a classification of the content representation and a layout value characterizing a classification of the layout representation and providing the content value and layout value in the output for the supplied image;
training the discriminator such that the content value and layout value in the first output characterize a classification into a first class and such that the content value and layout value in the second output characterize a classification into a second class; and
training the generator such that the content value and layout value in the first output characterize a classification into the second class.