US 12,456,287 B2
	Synthetic dataset creation for object detection and classification with deep learning
Eugen Solowjow, Berkeley, CA (US); Ines Ugalde Diaz, Redwood City, CA (US); Yash Shahapurkar, Berkeley, CA (US); Juan L. Aparicio Ojea, Moraga, CA (US); and Heiko Claussen, Wayland, MA (US)
Assigned to Siemens Corporation, Washington, DC (US)
Appl. No. 18/578,471
Filed by Siemens Corporation, Washington, DC (US)
PCT Filed Aug. 6, 2021, PCT No. PCT/US2021/044867 § 371(c)(1), (2) Date Jan. 11, 2024, PCT Pub. No. WO2023/014369, PCT Pub. Date Feb. 9, 2023.
Prior Publication US 2024/0296662 A1, Sep. 5, 2024
Int. Cl. G06V 10/774 (2022.01); G06T 15/20 (2011.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 20/70 (2022.01)

CPC G06V 10/774 (2022.01) [G06T 15/20 (2013.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 20/70 (2022.01)]

12 Claims

1. A computer-implemented method for building an object detection module, comprising:

obtaining mesh representations of objects belonging to specified object classes of interest,

rendering a plurality of images by a physics-based simulator using the mesh representations of the objects, wherein each rendered image captures a simulated environment containing objects belonging to multiple of said object classes of interest placed in a bin or on a table, wherein the plurality of rendered images are generated by randomizing a set of parameters by the simulator to render a range of simulated environments, the set of parameters including environmental and sensor-based parameters,

generating a label for each rendered image, the label including a two-dimensional representation indicative of location and object classes of objects in the respective rendered image frame, wherein each rendered image and the respective label constitute a data sample of a synthetic training dataset,

training a deep learning model using the synthetic training dataset to output object classes from an input image of a real-world physical environment,

deploying the trained deep learning model for testing on a set of real-world test images to generate an inference output for each test image,

adjusting the training dataset based on a success of the generated inference outputs,

identifying a “failure” image from the set of test images for which the generated inference output does not meet a defined success criterion,

feeding the “failure” image to the simulator to render additional images by randomizing the set of parameters around an environmental or sensor-based setting that corresponds to the “failure” image and to generate a respective label for each additional rendered image, and

retraining the deep learning model using the rendered additional images and the respective labels.