US 11,869,257 B2
	AR-based labeling tool for 3D object detection model training
Guoqiang Hu, Shanghai (CN); Sheng Nan Zhu, Shanghai (CN); Yuan Yuan Ding, Shanghai (CN); Hong Bing Zhang, Beijing (CN); Dan Zhang, Beijing (CN); and Tian Tian Chai, Beijing (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Mar. 19, 2021, as Appl. No. 17/206,255.
Prior Publication US 2022/0300738 A1, Sep. 22, 2022
Int. Cl. G06V 10/44 (2022.01); G06V 20/80 (2022.01); G06T 7/12 (2017.01); G06V 20/56 (2022.01); G06V 10/82 (2022.01)

CPC G06V 20/80 (2022.01) [G06T 7/12 (2017.01); G06V 10/44 (2022.01); G06V 10/82 (2022.01); G06V 20/56 (2022.01); G06T 2207/20084 (2013.01); G06T 2207/30252 (2013.01)]

25 Claims

1. A computer implemented method for detecting and labeling an object in a 2D image comprising:

receiving a plurality of 2D images from a visual sensor, each image of the plurality of 2D images includes an image of a target object in an surrounding environment;

detecting features and extracting feature points from the plurality of 2D images;

manually marking visible feature points of the target object on each image of the plurality of 2D images;

estimating occluded feature points of the target object in at least one of the plurality of 2D images by defining axis lines starting from visible marked points;

generating from the plurality of 2D images a 3D world coordinate system of the environment surrounding the target object;

mapping each of the marked feature points on the plurality of 2D images to the 3D world coordinate system;

generating a 3D map of the surrounding environment using the extracted feature points;

automatically generating a 3D bounding box for the target object covering all the marked points mapped to the 3D world coordinate system;

determining a ground plane of the 3D world coordinate system;

automatically fitting, in the 3D world coordinate system, the 3D bounding box for the target object in each of the plurality of 2D images based on the visible and estimated occluded points, the axis lines and the ground plane;

mapping the 3D bounding box back to the plurality of 2D images; and

generating a label for the target object surrounded by the 3D bounding box on each of the plurality of 2D images using a machine learning object detection model and projecting the label to corresponding feature points of the target object in the 3D map of the surrounding environment.