US 12,307,784 B1
Object detection using augmented data
Po-Jen Lai, Mountain View, CA (US); Shuangting Liu, Foster City, CA (US); and Francesco Papi, Oakland, CA (US)
Assigned to Zoox, Inc., Foster City, CA (US)
Filed by Zoox, Inc., Foster City, CA (US)
Filed on Sep. 29, 2022, as Appl. No. 17/956,631.
Int. Cl. G06V 20/58 (2022.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01)
CPC G06V 20/58 (2022.01) [G06V 10/764 (2022.01); G06V 10/776 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising:
receiving a first dataset comprising a first multichannel data structure representing an environment, the first multichannel data structure comprising a plurality of channels representing detections in the environment and one or more indications of a significant condition represented in the first multichannel data structure;
determining a property of a portion of the first multichannel data structure associated with an indication of the significant condition;
determining a portion of a second multichannel data structure associated with the property;
based at least in part on determining the portion of a second multichannel data structure:
determining augmented geometric data using a first augmentation based at least in part on first data associated with the portion of the second multichannel data structure; and
determining augmented non-geometric data using a second augmentation based at least in part on second data associated with the portion of the second multichannel data structure and the augmented geometric data, wherein the first augmentation is distinct from the second augmentation;
determining a second dataset comprising an augmented multichannel data structure based at least in part on the augmented geometric data and the augmented non-geometric data;
determining a top-down image based at least in part on the augmented multichannel data structure; and
training a machine-learned (ML) model to perform object detection based at least in part on the top-down image.