CPC G06F 18/2148 (2023.01) [B25J 9/1671 (2013.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01)] | 17 Claims |
1. A method implemented by one or more processors, the method comprising:
generating a sequence of instances of observed state data, each of the instances of the observed state data of the sequence each defining one or both of:
a corresponding measured value for each of one or more properties of a real robot or simulated robot, and
a corresponding state for each of one or more real or simulated environmental objects;
sequentially processing the sequence of instances of observed state data using a trained difference model, wherein the trained difference model is a trained neural network model;
extracting a difference representation from the trained difference model after sequentially processing the sequence of observed state values, wherein extracting the difference representation from the trained difference model comprises extracting the difference representation from a layer of the difference model after sequentially processing the sequence of observed state values; and
applying the difference representation to a policy model, along with additional observed instances of observed state data, in generating predicted actions for control of the real robot or the simulated robot.
|