US 12,136,230 B2
Method for training neural network, system for training neural network, and neural network
Sven Meier, Wilsele (BE); Octave Mariotti, Paris (FR); Hakan Bilen, Edinburgh (GB); and Oisin MacAodha, Edinburgh (GB)
Assigned to TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP); and THE UNIVERSITY COURT OF THE UNIVERSITY OF EDINBURGH, South Bridge Endinburgh (GB)
Filed by TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP); and The University Court of the University of Edinburgh, South Bridge Edinburgh (GB)
Filed on Apr. 11, 2022, as Appl. No. 17/717,546.
Claims priority of application No. 21167942 (EP), filed on Apr. 12, 2021.
Prior Publication US 2022/0327730 A1, Oct. 13, 2022
Int. Cl. G06T 7/70 (2017.01); G06N 3/045 (2023.01); G06T 15/06 (2011.01)
CPC G06T 7/70 (2017.01) [G06N 3/045 (2023.01); G06T 15/06 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30252 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A method for training a first neural network to detect a viewpoint of an object visible on an image, centered, and belonging to a given category of object when this image is inputted to the first neural network, comprising:
providing a dataset of pairs of images, each pair of images comprising a first image on which an object belonging to said category is visible under a first viewpoint and centered, and a second image on which the same object is visible under a second viewpoint which differs from the first viewpoint, and centered,
providing a second neural network configured to be able to deliver appearance information of an object visible on an image and belonging to said category when this image is inputted to the second neural network,
providing a third neural network configured to be able to deliver a synthetic image of an object of said category when appearance information of an object and a viewpoint are inputted to the third neural network,
jointly training the first neural network, the second neural network, and the third neural network by adapting parameters of the first neural network, the second neural network, and the third neural network so as to minimize a distance between:
at least a portion of the first image of a pair of images from the dataset of pairs of image, this portion showing the object visible on the image, and
a synthetic image delivered by the third neural network after it receives as input a viewpoint delivered by the first neural network after inputting the first image to the first neural network and appearance information delivered by the second neural network after inputting the second image of the pair to the second neural network.