US 11,809,524 B2
System and method for training an adapter network to improve transferability to real-world datasets
Sergey Zakharov, San Francisco, CA (US); Wadim Kehl, Tokyo (JP); Vitor Guizilini, Santa Clara, CA (US); and Adrien David Gaidon, Mountain View, CA (US)
Assigned to Woven Planet North America, Inc., Los Altos, CA (US); and Toyota Research Institute, Inc., Los Altos, CA (US)
Filed by Toyota Research Institute, Inc., Los Altos, CA (US); and Woven Planet North America, Inc., Los Altos, CA (US)
Filed on Jul. 23, 2021, as Appl. No. 17/384,008.
Claims priority of provisional application 63/161,794, filed on Mar. 16, 2021.
Prior Publication US 2022/0300770 A1, Sep. 22, 2022
Int. Cl. G06F 18/21 (2023.01); G06T 19/20 (2011.01); G06V 10/24 (2022.01); G06V 20/64 (2022.01); G06F 18/213 (2023.01); G06F 18/214 (2023.01)
CPC G06F 18/2185 (2023.01) [G06F 18/213 (2023.01); G06F 18/2148 (2023.01); G06T 19/20 (2013.01); G06V 10/242 (2022.01); G06V 20/64 (2022.01); G06T 2207/10024 (2013.01); G06T 2207/20084 (2013.01); G06T 2210/12 (2013.01); G06T 2219/2021 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a processor; and
a memory in communication with the processor and having machine-readable instructions that, when executed by the processor, cause the processor to:
output, using a neural network that utilizes an input image that includes an object, a predicted scene that includes a three-dimensional bounding box having pose information of the object, wherein the neural network generates, in an intermediate operation, an output map indicating at least one of a shape of the object and a surface normal of the object,
generate, using a differentiable renderer and based on the predicted scene, a rendered map of the object, the rendered map including at least one of a rendered shape of the object and a rendered surface normal of the object, and
train an adapter network, which adapts the predicted scene to adjust for a deformation of the input image, by comparing the rendered map to the output map, wherein the output map is a ground truth.