US 11,669,930 B2
	Multicamera image processing
Kevin Jose Chavez, Woodside, CA (US); Yuan Gao, Mountain View, CA (US); Rohit Pidaparthi, Mountain View, CA (US); Talbot Morris-Downing, Menlo Park, CA (US); Harry Zhe Su, Union City, CA (US); and Samir Menon, Palo Alto, CA (US)
Assigned to Dexterity, Inc., Redwood City, CA (US)
Filed by Dexterity, Inc., Palo Alto, CA (US)
Filed on Oct. 29, 2019, as Appl. No. 16/667,661.
Application 16/667,661 is a continuation in part of application No. 16/380,859, filed on Apr. 10, 2019, granted, now 10,549,928.
Claims priority of provisional application 62/809,389, filed on Feb. 22, 2019.
Claims priority of provisional application 62/809,389, filed on Feb. 22, 2019.
Prior Publication US 2020/0273138 A1, Aug. 27, 2020
Int. Cl. G06T 1/00 (2006.01); G06T 1/20 (2006.01); G06T 7/174 (2017.01); G06T 7/593 (2017.01); G06T 7/90 (2017.01); H04N 13/204 (2018.01)

CPC G06T 1/0014 (2013.01) [G06T 1/20 (2013.01); G06T 7/174 (2017.01); G06T 7/593 (2017.01); G06T 7/90 (2017.01); G06T 2207/10024 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/20221 (2013.01); H04N 13/204 (2018.05)]

21 Claims

1. A system, comprising:

a communication interface configured to receive image data from each of a plurality of sensors associated with a workspace wherein the image data comprises for each sensor in the plurality of sensors one or both of visual image information and depth information, and the plurality of image sensors comprises a plurality of cameras; and

a processor coupled to the communication interface and configured to:

merge image data from the plurality of sensors to generate a merged point cloud data;

perform segmentation based on visual image data from a subset of the sensors in the plurality of sensors to generate a segmentation result, wherein:

the segmentation result is obtained based on performing segmentation using RGB data from a camera;

the segmentation result comprises a plurality of RGB pixels; and

a subset of the plurality of RGB pixels is identified based at least in part on determination that the corresponding RGB pixels are associated with an object boundary;

use one or both of the merged point cloud data and the segmentation result to generate a merged three dimensional and segmented view of the workspace, including by:

mapping RGB pixels identified in the segmentation result to corresponding depth pixels to obtain mapped depth pixel information;

using the segmentation result and the mapped depth pixel information to de-project to a point cloud with segmented shapes around points for each object;

for each of the plurality of cameras, labelling the point cloud generated by that camera for each object and computing a corresponding centroid; and

using nearest neighbor computations between centroids of corresponding object point clouds of the plurality of cameras to segment objects within the workspace; and

use one or both of the merged point cloud data and the segmentation result to determine a strategy to grasp an object present in the workspace using a robotic arm.