| CPC G06N 3/08 (2013.01) [B25J 9/161 (2013.01); B25J 9/163 (2013.01); B25J 9/1666 (2013.01); B25J 19/023 (2013.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 40/10 (2022.01); G06V 40/20 (2022.01)] | 18 Claims |

|
1. A method implemented by one or more processors, the method comprising:
generating a predicted sequence of terminals for use in controlling one or more actuators of a robot, wherein each of the terminals represents corresponding predicted activity in an environment of the robot at a corresponding future time step, and wherein generating the predicted sequence of terminals comprises:
receiving one or more observations, of the environment, captured using one or more sensors of the robot;
determining an initial non-terminal by processing the one or more observations using a machine learning model, wherein the initial non-terminal is an encoded representation of observed activity in the environment captured in the one or more observations;
for each of a plurality of iterations and until one or more conditions are satisfied:
processing a corresponding non-terminal using the machine learning model to determine a corresponding set of rules for the corresponding non-terminal,
wherein each rule, of the corresponding set of rules, includes a corresponding candidate terminal and a corresponding candidate next non-terminal, and
wherein the initial non-terminal is used as the corresponding non-terminal that is processed in an initial iteration of the iterations;
selecting a rule from the corresponding set of rules;
assigning the corresponding candidate terminal, for the selected rule, as the next terminal in the predicted sequence of terminals, and
assigning the corresponding candidate non-terminal, for the selected rule, as the corresponding non-terminal for use in generating the predicted sequence of terminals in an immediately subsequent iteration of the iterations; and
controlling the one or more actuators of the robot based on the predicted sequence of terminals.
|