US 11,887,324 B2
Cross-modality active learning for object detection
Kok Seang Tan, Serangoon (SG); Holger Caesar, Singapore (SG); and Oscar Olof Beijbom, Santa Monica, CA (US)
Assigned to Motional AD LLC, Boston, MA (US)
Filed by Motional AD LLC, Boston, MA (US)
Filed on Jun. 30, 2021, as Appl. No. 17/363,085.
Prior Publication US 2023/0005173 A1, Jan. 5, 2023
Int. Cl. G06T 7/70 (2017.01); G06F 18/21 (2023.01); G06T 11/00 (2006.01)
CPC G06T 7/70 (2017.01) [G06F 18/217 (2023.01); G06T 11/003 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30261 (2013.01); G06T 2210/12 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method, comprising:
generating, by a processor, a first set of predicted bounding boxes based on images from an image sensor and a second set of predicted bounding boxes based on a point cloud from a LiDAR sensor, wherein a respective predicted bounding box of the first set of predicted bounding boxes and the second set of predicted bounding boxes is assigned a classification score indicating a presence of an object class instance within the respective predicted bounding box;
projecting, by the processor, the first set of predicted bounding boxes and the second set of predicted bounding boxes into a same representation;
filtering, by the processor, the projections wherein a first subset of predicted bounding boxes satisfying a maximum confidence score is selected from the first set of predicted bounding boxes and a second subset of predicted bounding boxes satisfying the maximum confidence score is selected from the second set of predicted bounding boxes;
calculating, by the processor, inconsistencies between the first subset of predicted bounding boxes associated with the images from the image sensor and the second subset of predicted bounding boxes associated with the point cloud from the LiDAR sensor based on filtering the projections;
extracting, by the processor, an informative scene based on the calculated inconsistencies; and
training, by the processor, a first object detection neural network or a second object detection neural network using the informative scene.