US 12,226,920 B2
System(s) and method(s) of using imitation learning in training and refining robotic control policies
Seyed Mohammad Khansari Zadeh, San Carlos, CA (US); Eric Jang, Cupertino, CA (US); Daniel Lam, Mountain View, CA (US); Daniel Kappler, San Francisco, CA (US); Matthew Bennice, San Jose, CA (US); Brent Austin, Munich (DE); Yunfei Bai, Fremont, CA (US); Sergey Levine, Berkeley, CA (US); Alexander Irpan, Palo Alto, CA (US); Nicolas Sievers, El Cerrito, CA (US); and Chelsea Finn, Berkeley, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Aug. 11, 2023, as Appl. No. 18/233,251.
Application 18/233,251 is a continuation of application No. 17/203,296, filed on Mar. 16, 2021, granted, now 11,772,272.
Prior Publication US 2023/0381970 A1, Nov. 30, 2023
Int. Cl. B25J 9/16 (2006.01); B25J 13/06 (2006.01)
CPC B25J 9/1697 (2013.01) [B25J 9/161 (2013.01); B25J 9/163 (2013.01); B25J 9/1661 (2013.01); B25J 13/06 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method implemented using one or more processors, the method comprising:
receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the vision data being captured during performance of a robotic task by the robot;
processing, using a robotic control policy, the instance of the vision data to generate a sequence of actions to be performed by the robot during the robotic task, wherein the sequence of actions includes an initial action to be performed by the robot in furtherance of the robotic task and a plurality of predicted actions that follow the initial action, and wherein processing the instance of the vision data to generate the sequence of actions to be performed by the robot during the robotic task and using the robotic control policy comprises:
processing, using an intermediate portion of the robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data;
processing, using a first control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding first set of values for a first portion of control of a component of the robot;
processing, using a second control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding second set of values for a second portion of control of the component of the robot; and
processing, using a third control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding third set of values for a third portion of control of the component of the robot;
causing, based on the sequence of actions to be performed, the robot to initiate performance of the robotic task;
during performance of the robotic task:
causing a representation of the sequence of actions to be visually rendered via a graphical user interface of a computing device; and
receiving, from a user of the computing device, and based on the representation of the sequence of actions, user input that intervenes with ongoing performance of the robotic task, the user input being received via the computing device or an additional computing device; and
causing the robotic control policy to be updated based on the user input.