US 12,326,738 B2
	Method for controlling motions of quadruped robot based on reinforcement learning and position increment
Wei Zhang, Jinan (CN); Jiapeng Sheng, Jinan (CN); Yanyun Chen, Jinan (CN); Xing Fang, Jinan (CN); and Wenhao Tan, Jinan (CN)
Assigned to SHANDONG UNIVERSITY, Jinan (CN)
Appl. No. 18/266,694
Filed by SHANDONG UNIVERSITY, Shandong (CN)
PCT Filed Oct. 18, 2022, PCT No. PCT/CN2022/125983 § 371(c)(1), (2) Date Jun. 12, 2023, PCT Pub. No. WO2023/159978, PCT Pub. Date Aug. 31, 2023.
Claims priority of application No. 202210191785.X (CN), filed on Feb. 28, 2022.
Prior Publication US 2025/0021109 A1, Jan. 16, 2025
Int. Cl. G05D 1/646 (2024.01); B62D 57/032 (2006.01)

CPC G05D 1/646 (2024.01) [B62D 57/032 (2013.01)]

6 Claims

1. A method for controlling motions of a quadruped robot based on reinforcement learning and position increment, comprising:

acquiring motion environment information, quadruped robot attitude information, and foot sole position information;

based on the acquired information, generating foot sole positions of the quadruped robot during motions within all preset time steps, and calculating a change of the foot sole positions in all the time steps;

taking a maximum moving distance within a single time step as a constraint, and accumulating the time steps at the same time to obtain a foot sole position trajectory;

controlling the quadruped robot to perform corresponding actions based on the foot sole position trajectory combined with a preset reward function, so as to keep motion balance of the quadruped robot;

acquiring and processing joint state historical information and leg phase information of the quadruped robot as a control input of the quadruped robot, to obtain a next action command to control the motions of the quadruped robot, wherein

a pressure sensor is not provided on each foot sole of the quadruped robot, the joint state historical information being used as an input of a reinforcement learning policy, to achieve a detection of each foot sole of the quadruped robot to ground contact, and

the joint state historical information includes a joint position error and a joint velocity, the joint position error being a deviation between a current joint position and a previous joint position instruction;

outputting, by an independent policies modulating trajectory generator (PMTG) for each leg, a foot sole position of each leg of the quadruped robot in a Z-axis direction; wherein, the PMTG is defined to simulate a basic stepping gait mode by using cubic Hermite spline, and an equation is as follows:

where, k=2(ϕ−π)/π, h is a maximum allowable foot raising height, and ϕ∈[0,2π) TG phase ϕò[0,2π);

outputting a foot sole position increment and an adjusting frequency of each leg of the quadruped robot based on the reinforcement learning policy; and, a target foot sole position (x, y, z)_tat a time t is obtained by the following equation:

where, foot sole positions in an X-axis direction and a Y-axis direction are obtained by accumulating foot sole position increments (Δx, Δy) in the X-axis and Y-axis directions output by the reinforcement learning policy; and, a foot sole position in the Z-axis direction is obtained by superposing a foot sole position increment Δz in the Z-axis direction output by the reinforcement learning policy and a priori value provided by the PMTG.