US 12,012,127 B2
	Top-down view object detection and tracking
Subhasis Das, Menlo Park, CA (US); Benjamin Isaac Zwiebel, Burlingame, CA (US); Kai Yu, Burlingame, CA (US); and James William Vaisey Philbin, Palo Alto, CA (US)
Assigned to Zoox, Inc., Foster City, CA (US)
Filed by Zoox, Inc., Foster City, CA (US)
Filed on Jan. 31, 2020, as Appl. No. 16/779,576.
Claims priority of provisional application 62/926,423, filed on Oct. 26, 2019.
Prior Publication US 2021/0181758 A1, Jun. 17, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. B60W 60/00 (2020.01); G01S 13/89 (2006.01); G01S 13/931 (2020.01); G01S 17/89 (2020.01); G01S 17/931 (2020.01); G05D 1/00 (2024.01); G06T 7/215 (2017.01); G06T 7/246 (2017.01); G06T 7/292 (2017.01); G06V 10/25 (2022.01); G06V 10/778 (2022.01); G06V 10/80 (2022.01); G06V 20/56 (2022.01); G06V 30/19 (2022.01); G06V 30/24 (2022.01)

CPC B60W 60/0027 (2020.02) [G01S 13/89 (2013.01); G01S 13/931 (2013.01); G01S 17/89 (2013.01); G01S 17/931 (2020.01); G05D 1/0248 (2013.01); G06T 7/215 (2017.01); G06T 7/251 (2017.01); G06T 7/292 (2017.01); G06V 10/25 (2022.01); G06V 10/778 (2022.01); G06V 10/80 (2022.01); G06V 20/56 (2022.01); G06V 30/19147 (2022.01); G06V 30/19173 (2022.01); G06V 30/1918 (2022.01); G06V 30/2552 (2022.01); G01S 2013/932 (2020.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30241 (2013.01); G06T 2207/30261 (2013.01)]

19 Claims

9. A system comprising:

one or more processors; and

a memory storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

receiving first sensor data and second sensor data;

inputting the first sensor data to a first perception pipeline and inputting the second sensor data to a second perception pipeline;

receiving a first output from the first perception pipeline based at least in part on the first sensor data and a second output from the second perception pipeline, the first output and the second output identifying an object in an environment;

receiving a previous track associated with the object in the environment, the previous track identifying at least one of an estimated previous position of the object, a previous region of interest, or a previous velocity of the object;

inputting the first output, the second output, and at least part of the previous track into a machine-learning (ML) model;

receiving, from the ML model, a data structure comprising a region of interest, object classification, and a pose associated with the object, the pose indicating at least one of a position or a yaw associated with the object;

determining an updated track associated with the object based at least in part on the data structure, a current position, and at least one of the region of interest or the yaw associated with the object; and

updating, based at least in part on the data structure, one or more previous tracks by retiring the one or more previous tracks, wherein retiring the one or more previous tracks comprises indicating that the object associated with the one or more previous tracks has been occluded for a threshold amount of time.