US 11,854,255 B2
	Human-object scene recognition method, device and computer-readable storage medium
Chuqiao Dong, Pasadena, CA (US); Dan Shao, San Gabriel, CA (US); Zhen Xiu, Chino Hills, CA (US); Dejun Guo, San Gabriel, CA (US); and Huan Tan, Pasadena, CA (US)
Assigned to UBKANG (QINGDAO) TECHNOLOGY CO., LTD., Qingdao (CN)
Filed by UBKang (Qingdao) Technology Co., Ltd., Qingdao (CN)
Filed on Jul. 27, 2021, as Appl. No. 17/386,531.
Prior Publication US 2023/0030837 A1, Feb. 2, 2023
Int. Cl. G06V 20/10 (2022.01); G06T 7/11 (2017.01); G06T 7/70 (2017.01); G06V 40/10 (2022.01)

CPC G06V 20/10 (2022.01) [G06T 7/11 (2017.01); G06T 7/70 (2017.01); G06V 40/10 (2022.01); G06T 2207/10024 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/30196 (2013.01)]

20 Claims

1. A computer-implemented human-object scene recognition method executed by one or more processors, the method comprising:

acquiring an input RGB image and a depth image corresponding to the RGB image;

detecting objects and/or humans in the RGB image using a segmentation classification algorithm based on a sample database;

in response to detection of objects and/or humans, performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image, and acquiring a result of the segment detection:

calculating 3D bounding boxes for each of the detected objects and/or humans according to the result of the segment detection; and

determining a position of each of the detected objects and/or humans according to the 3D bounding boxes;

wherein detecting the objects and/or humans in the RGB image using the segmentation classification algorithm based on the sample database comprises:

generating segmentation masks for the objects and/or humans in the RGB image to acquire coordinates of pixels corresponding to each of the objects and/or humans in the RGB image; and

wherein performing the segment detection to each of the detected objects and/or humans based on the RGB image and the depth image comprises:

shrinking contours of objects and/or humans in each seoment of the RGB image and the depth image inwardly using an erode algorithm, to acquire confident segments of the objects and/or humans in each segment of the RGB image and the depth image; and

calculating the 3D bounding boxes corresponding to shrank data using a Convex Hull algorithm to compensate for volume of the objects and/or humans in each segment of the RGB image and the depth image.