CPC G06V 20/10 (2022.01) [G06T 7/11 (2017.01); G06T 7/70 (2017.01); G06V 40/10 (2022.01); G06T 2207/10024 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/30196 (2013.01)] | 20 Claims |
1. A computer-implemented human-object scene recognition method executed by one or more processors, the method comprising:
acquiring an input RGB image and a depth image corresponding to the RGB image;
detecting objects and/or humans in the RGB image using a segmentation classification algorithm based on a sample database;
in response to detection of objects and/or humans, performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image, and acquiring a result of the segment detection:
calculating 3D bounding boxes for each of the detected objects and/or humans according to the result of the segment detection; and
determining a position of each of the detected objects and/or humans according to the 3D bounding boxes;
wherein detecting the objects and/or humans in the RGB image using the segmentation classification algorithm based on the sample database comprises:
generating segmentation masks for the objects and/or humans in the RGB image to acquire coordinates of pixels corresponding to each of the objects and/or humans in the RGB image; and
wherein performing the segment detection to each of the detected objects and/or humans based on the RGB image and the depth image comprises:
shrinking contours of objects and/or humans in each seoment of the RGB image and the depth image inwardly using an erode algorithm, to acquire confident segments of the objects and/or humans in each segment of the RGB image and the depth image; and
calculating the 3D bounding boxes corresponding to shrank data using a Convex Hull algorithm to compensate for volume of the objects and/or humans in each segment of the RGB image and the depth image.
|