CPC G05B 19/423 (2013.01) | 20 Claims |
1. A method for training a control system, comprising:
receiving, by a computing system, an initial teacher policy based on a trajectory example generated by a human operator in a first action space, the trajectory example captured using one or more sensors monitoring movements of the human operator, the trajectory example comprising trajectory information of the human operator while performing a task to be learned by a control system of the computing system;
based on the initial teacher policy, generating, by the computing system, an initial student policy by training the control system to perform the task exemplified in the trajectory example, wherein the control system exists is a second action space that is lower dimension from the first action space, wherein movements of the control system in the second action space are limited compared to movements of the human operator in the first action space, wherein training the control system comprises:
causing the control system to mimic the movements of the human operator while performing the task, and
monitoring the movements of the control system using sensors, and
generating an output trajectory of the control system performing the task based on the monitored movements;
providing, by the computing system, the output trajectory of the control system to the human operator for determining a reproducibility of the trajectory example based on the output trajectory generated by the control system;
receiving, by the computing system, an updated teacher policy based on an updated trajectory example generated by the human operator responsive to the determined reproducibility of the trajectory example; and
based on the updated teacher policy, generating, by the computing system, an updated student policy by training the control system to perform the task exemplified in the updated trajectory example.
|