US 12,462,526 B2
	Training method for a generator for generating realistic images
Anna Khoreva, Stuttgart (DE); Edgar Schoenfeld, Tuebingen (DE); Vadim Sushko, Stuttgart (DE); and Dan Zhang, Leonberg (DE)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Appl. No. 17/999,000
Filed by Robert Bosch GmbH, Stuttgart (DE)
PCT Filed Aug. 20, 2021, PCT No. PCT/EP2021/073127 § 371(c)(1), (2) Date Nov. 16, 2022, PCT Pub. No. WO2022/043204, PCT Pub. Date Mar. 3, 2022.
Claims priority of application No. 10 2020 210 710.6 (DE), filed on Aug. 24, 2020.
Prior Publication US 2023/0177809 A1, Jun. 8, 2023
Int. Cl. G06V 10/764 (2022.01); G06T 11/00 (2006.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/56 (2022.01); G06V 20/70 (2022.01)

CPC G06V 10/764 (2022.01) [G06T 11/00 (2013.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/56 (2022.01); G06V 20/70 (2022.01)]

11 Claims

1. A method for training a generator for images from a semantic map that assigns each pixel of the image a semantic meaning of an object to which that pixel belongs, the method comprising the following steps:

providing actual training images and associated semantic training maps that assign a semantic meaning to each pixel of the associated training image;

generating images from at least one of the semantic training maps using the generator;

determining at least one actual training image in relation to the same at least one of the semantic training maps;

generating a mixed image from at least one image generated by the generator and at least one determined actual training image, wherein in the mixed image, a first genuine subset of pixels is occupied by relevant corresponding pixel values of the image generated by the generator and a remaining genuine subset of pixels is occupied by relevant corresponding pixel values of the actual training image;

supplying the images generated by the generator, the at least one actual training image, and at least one mixed image, which belong to the same semantic training map, to a discriminator, which is configured to distinguish images generated by the generator from actual images of scenery predefined by the semantic training map;

optimizing generator parameters characterizing a behavior of the generator so that the discriminator misclassifies the images generated by the generator as actual images; and

optimizing discriminator parameters characterizing a behavior of the discriminator so that an accuracy of a distinction between generated images and actual images is improved, wherein contiguous regions of pixels of the mixed image to which the same semantic meaning is assigned by the semantic training map are occupied either uniformly by corresponding pixel values of the image generated by the generator or uniformly by corresponding pixel values of the actual training image.