US 12,153,414 B2
	Imitation learning in a manufacturing environment
Matthew C. Putman, Brooklyn, NY (US); Andrew Sundstrom, Brooklyn, NY (US); Damas Limoge, Brooklyn, NY (US); Vadim Pinskiy, Wayne, NJ (US); Aswin Raghav Nirmaleswaran, Brooklyn, NY (US); and Eun-Sol Kim, Cliffside Park, NJ (US)
Assigned to Nanotronics Imaging, Inc., Cuyahoga Falls, OH (US)
Filed by Nanotronics Imaging, Inc., Cuyahoga Falls, OH (US)
Filed on Feb. 25, 2022, as Appl. No. 17/652,607.
Claims priority of provisional application 63/153,811, filed on Feb. 25, 2021.
Prior Publication US 2022/0269254 A1, Aug. 25, 2022
Int. Cl. G05B 19/423 (2006.01)

CPC G05B 19/423 (2013.01)

20 Claims

1. A method for training a control system, comprising:

receiving, by a computing system, an initial teacher policy based on a trajectory example generated by a human operator in a first action space, the trajectory example captured using one or more sensors monitoring movements of the human operator, the trajectory example comprising trajectory information of the human operator while performing a task to be learned by a control system of the computing system;

based on the initial teacher policy, generating, by the computing system, an initial student policy by training the control system to perform the task exemplified in the trajectory example, wherein the control system exists is a second action space that is lower dimension from the first action space, wherein movements of the control system in the second action space are limited compared to movements of the human operator in the first action space, wherein training the control system comprises:

causing the control system to mimic the movements of the human operator while performing the task, and

monitoring the movements of the control system using sensors, and

generating an output trajectory of the control system performing the task based on the monitored movements;

providing, by the computing system, the output trajectory of the control system to the human operator for determining a reproducibility of the trajectory example based on the output trajectory generated by the control system;

receiving, by the computing system, an updated teacher policy based on an updated trajectory example generated by the human operator responsive to the determined reproducibility of the trajectory example; and

based on the updated teacher policy, generating, by the computing system, an updated student policy by training the control system to perform the task exemplified in the updated trajectory example.