US 11,772,272 B2
System(s) and method(s) of using imitation learning in training and refining robotic control policies
Seyed Mohammad Khansari Zadeh, San Carlos, CA (US); Eric Jang, Cupertino, CA (US); Daniel Lam, Mountain View, CA (US); Daniel Kappler, San Francisco, CA (US); Matthew Bennice, San Jose, CA (US); Brent Austin, Munich (DE); Yunfei Bai, Fremont, CA (US); Sergey Levine, Berkeley, CA (US); Alexander Irpan, Palo Alto, CA (US); Nicolas Sievers, El Cerrito, CA (US); and Chelsea Finn, Berkeley, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Mar. 16, 2021, as Appl. No. 17/203,296.
Prior Publication US 2022/0297303 A1, Sep. 22, 2022
Int. Cl. B25J 9/16 (2006.01); B25J 13/06 (2006.01)
CPC B25J 9/1697 (2013.01) [B25J 9/161 (2013.01); B25J 9/163 (2013.01); B25J 9/1661 (2013.01); B25J 13/06 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method implemented using one or more processors, the method comprising:
receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the instance of the vision data being captured during performance of a robotic task by the robot;
processing, using a robotic control policy, the instance of the vision data to generate a sequence of actions to be performed by the robot during the robotic task, the sequence of actions including an initial action to be performed by the robot in furtherance of the robotic task and a plurality of predicted actions that are predicted to follow the initial action;
determining, based on processing the instance of the vision data using the robotic control policy, whether the robot will fail in performance of the robotic task;
in response to determining that the robot will fail in performance of the robotic task:
causing a prompt to be rendered via an interface of a computing device or the robot, the prompt requesting a user of the computing device intervene in performance of the robotic task;
receiving, from a user of the computing device, and based on the prompt, user input that intervenes with performance of the robotic task, the user input being received via an input device of the computing device or an additional computing device; and
causing the robotic control policy to be updated based on the user input;
in response to determining that the robot will not fail in performance of the robotic action, causing the robot to perform the initial action; and
until the robot completes performance of the robotic task:
receiving, from one or more of the vision components of the robot, an additional instance of vision data capturing the environment of the robot, the additional instance of the vision data being captured during performance of the robotic task by the robot;
processing, using the robotic control policy, the additional instance of the vision data to generate an additional sequence of actions to be performed by the robot during the robotic task, the additional sequence of actions including a next action to be performed by the robot in furtherance of the robotic task and an additional plurality of predicted actions that are predicted to follow the next action; and
determining, based on processing the additional instance of the vision data using the robotic control policy, whether the robot will fail in performance of the robotic task.