US 11,734,910 B2
	Real-time object detection using depth sensors
Niluthpol Chowdhury Mithun, Lawrenceville, NJ (US); Sirajum Munir, Pittsburgh, PA (US); and Charles Shelton, Monroeville, PA (US)
Assigned to Robert Bosch GmbH, Stuttgart (DE)
Appl. No. 16/971,054
Filed by Robert Bosch GmbH, Stuttgart (DE)
PCT Filed Feb. 19, 2019, PCT No. PCT/EP2019/054016 § 371(c)(1), (2) Date Aug. 19, 2020, PCT Pub. No. WO2019/162241, PCT Pub. Date Aug. 29, 2019.
Claims priority of provisional application 62/633,202, filed on Feb. 21, 2018.
Prior Publication US 2021/0089841 A1, Mar. 25, 2021
Int. Cl. G06V 10/44 (2022.01); G06T 7/00 (2017.01); G06T 7/90 (2017.01); G06N 3/04 (2023.01); G06F 18/2413 (2023.01); G06F 18/214 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/52 (2022.01)

CPC G06V 10/454 (2022.01) [G06F 18/214 (2023.01); G06F 18/24133 (2023.01); G06N 3/04 (2013.01); G06T 7/90 (2017.01); G06T 7/97 (2017.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/52 (2022.01); G06T 2207/10024 (2013.01); G06T 2207/10028 (2013.01)]

12 Claims

1. A method of training a convolutional neural network for depth-based object detection, the method comprising:

storing, in a memory, data and program instructions corresponding to a convolutional neural network, including:

a base network configured to receive RGB image data as input and compute output data indicative of at least one feature of an object in the received RGB image data, the base network pre-trained to compute feature detections using an RGB image dataset; and

additional structure configured to receive the output data of the base network as input and compute predictions of a location of a region in the received RGB image that includes the object and of a class of the object, such that the convolutional neural network is configured to receive RGB test image data as input and compute the predictions as output;

storing, in the memory, (i) a dataset of training depth images, each including at least one annotation that localizes a region of a respective depth image as containing a training object and identifies a class of the training object and (ii) a complexity metric for each image in the dataset of training depth images, the complexity metric indicative of a feature complexity of the respective image;

generating, with a processor, a training dataset for the convolutional neural network by reformatting each image in the dataset of training depth images as an RGB image; and

training, with the processor, the convolutional neural network using the training dataset to form a depth-based object-detection convolutional neural network configured to receive a depth image formatted as RGB image data as input and compute predictions of a location of a region in the received depth image that includes a test object and of a class of the test object as output, the training including:

segmenting the training dataset into a first batch and a second batch such that the first batch has complexity metrics that are higher than complexity metrics of the second batch; and

introducing the training dataset to the convolutional neural network according to a curriculum that orders images within the training dataset by ascending complexity metric, the curriculum including training the convolutional neural network using the first batch of the training dataset and then training the convolutional neural network using the first batch and the second batch of the training dataset.