US 11,868,439 B2
	Mixed-batch training of a multi-task network
Vitor Guizilini, Santa Clara, CA (US); Adrien David Gaidon, Mountain View, CA (US); Jie Li, Los Altos, CA (US); and Rares A. Ambrus, San Francisco, CA (US)
Assigned to Toyota Research Institute, Inc., Los Altos, CA (US)
Filed by Toyota Research Institute, Inc., Los Altos, CA (US)
Filed on Mar. 29, 2021, as Appl. No. 17/215,646.
Claims priority of provisional application 63/113,477, filed on Nov. 13, 2020.
Prior Publication US 2022/0156525 A1, May 19, 2022
Int. Cl. G06F 18/21 (2023.01); G06T 9/00 (2006.01); G06V 20/56 (2022.01); G06V 20/64 (2022.01); G06T 7/73 (2017.01); G06T 7/50 (2017.01); G06F 18/214 (2023.01)

CPC G06F 18/2178 (2023.01) [G06F 18/2148 (2023.01); G06T 7/50 (2017.01); G06T 7/74 (2017.01); G06T 9/002 (2013.01); G06V 20/56 (2022.01); G06V 20/64 (2022.01); G06T 2207/10024 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

17 Claims

1. A perception system, comprising:

one or more processors; and

a memory communicably coupled to the one or more processors and storing:

a network module including instructions that, when executed by the one or more processors, cause the one or more processors to:

acquire training data that includes real data and virtual data for training a multi-task network that performs at least depth prediction and semantic segmentation, the virtual data including synthetic images;

generate a first output from the multi-task network using the real data and second output from the multi-task network using the virtual data over separate executions of the multi-task network;

generate a mixed loss by analyzing the first output to produce a real loss that includes at least a self-supervised loss and the second output to produce a virtual loss that includes a supervised loss comprised of a semantic loss, a depth loss, a surface normal loss, and a synthesis loss; and

update the multi-task network using the mixed loss.