US 11,854,280 B2
	Learning monocular 3D object detection from 2D semantic keypoint detection
Arjun Bhargava, San Francisco, CA (US); Haofeng Chen, Stanford, CA (US); Adrien David Gaidon, Mountain View, CA (US); Rares A. Ambrus, San Francisco, CA (US); and Sudeep Pillai, Santa Clara, CA (US)
Assigned to TOYOTA RESEARCH INSTITUTE, INC., Los Altos, CA (US)
Filed by TOYOTA RESEARCH INSTITUTE, INC., Los Altos, CA (US)
Filed on Apr. 27, 2021, as Appl. No. 17/242,046.
Prior Publication US 2022/0343096 A1, Oct. 27, 2022
Int. Cl. G06V 20/00 (2022.01); G06V 20/64 (2022.01); G05D 1/02 (2020.01); G06V 20/40 (2022.01); G06V 20/56 (2022.01); G06F 18/214 (2023.01); G06F 18/21 (2023.01); G06N 3/04 (2023.01)

CPC G06V 20/64 (2022.01) [G05D 1/0251 (2013.01); G06F 18/214 (2023.01); G06F 18/2163 (2023.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/56 (2022.01); G05D 2201/0213 (2013.01); G06N 3/04 (2013.01); G06V 2201/08 (2022.01)]

14 Claims

1. A method for 3D object detection, comprising:

detecting semantic keypoints from monocular images of a video stream capturing a 3D object;

inferring 3D bounding boxes of the 3D object by indexing the inferred 3D bounding box according to predicted keypoint coordinates corresponding to the detected semantic keypoints;

scoring the inferred 3D bounding boxes of the 3D object according to an objectness score, an object classification score, and 10D bounding box parameters predicted according to the predicted coordinates of the detected semantic keypoints;

discarding overlapping ones of the inferred 3D bounding boxes as redundant based on a user-defined overlap threshold using non-maxima suppression to determine a final set of 3D bounding boxes; and

detecting the 3D object according to the final set of 3D bounding boxes generated based on the scoring of the inferred 3D bounding boxes using score-thresholding and the non-maxima suppression.