US 12,172,309 B2
Systems and methods for learning agile locomotion for multiped robots
Jie Tan, Mountain View, CA (US); Tingnan Zhang, San Jose, CA (US); Atil Iscen, Mountain View, CA (US); Erwin Coumans, Foster City, CA (US); and Yunfei Bai, Fremont, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/047,892
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Apr. 22, 2019, PCT No. PCT/US2019/028454
§ 371(c)(1), (2) Date Oct. 15, 2020,
PCT Pub. No. WO2019/209681, PCT Pub. Date Oct. 31, 2019.
Claims priority of provisional application 62/661,055, filed on Apr. 22, 2018.
Prior Publication US 2021/0162589 A1, Jun. 3, 2021
Int. Cl. B62D 57/032 (2006.01); B25J 9/16 (2006.01); G05D 1/00 (2024.01); G06N 3/042 (2023.01); G06N 3/08 (2023.01)
CPC B25J 9/161 (2013.01) [B25J 9/1605 (2013.01); B25J 9/163 (2013.01); B25J 9/1664 (2013.01); B25J 9/1669 (2013.01); B62D 57/032 (2013.01); G05D 1/0891 (2013.01); G06N 3/042 (2023.01); G06N 3/08 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors, comprising:
receiving an instance of sensor data, the instance of sensor data generated based on output from one or more sensors of a robot,
wherein the instance of sensor data is based on a state of the robot following control of the robot using a prior instance of robot control parameters generated by applying a first observation and a first reference trajectory to a neural network model, the neural network model representing a learned policy for a robotic locomotion task and being previously trained through reinforcement learning, wherein the first reference trajectory indicates a first gait for the robotic locomotion task, is decoupled from the sensor data and is influenced by a first user interaction via one or more user interface input devices, and wherein the learned policy can be varied;
receiving a second reference trajectory indicating a second gait for the robotic locomotion task, wherein the second reference trajectory is decoupled from the sensor data and is influenced by a second user interaction via the one or more user interface input devices, and wherein the second gait is distinct from the first gait;
generating a second observation based on the instance of sensor data, wherein the second observation includes data missing from the second reference trajectory;
applying the second observation and the second reference trajectory to the neural network model to generate a current instance of robot control parameters; and
controlling one or more actuators of a robot based on the current instance of robot control parameters.