US 11,697,205 B2
	Determining control policies for robots with noise-tolerant structured exploration
Vikas Sindhwani, Hastings-on-Hudson, NY (US); Atil Iscen, Brooklyn, NY (US); and Krzysztof Marcin Choromanski, New York, NY (US)
Assigned to Google LLC, Mountain View, CA (US)
Appl. No. 16/649,598
Filed by GOOGLE LLC, Mountain View, CA (US)
PCT Filed Sep. 21, 2018, PCT No. PCT/US2018/052226 § 371(c)(1), (2) Date Mar. 20, 2020, PCT Pub. No. WO2019/060730, PCT Pub. Date Mar. 28, 2019.
Claims priority of provisional application 62/599,552, filed on Dec. 15, 2017.
Claims priority of provisional application 62/562,286, filed on Sep. 22, 2017.
Prior Publication US 2020/0276704 A1, Sep. 3, 2020
Int. Cl. G06N 3/08 (2023.01); B25J 9/16 (2006.01); G06N 20/00 (2019.01)

CPC B25J 9/163 (2013.01) [B25J 9/1661 (2013.01); B25J 9/1671 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01)]

20 Claims

1. A method for determining a control policy for a robot, the method comprising:

receiving a specification in the form of an objective function ƒ of a task to be performed by the robot,

wherein ƒ is a scalar-valued objective function of a control policy, and the control policy determines, for each state s of the robot, an action u to be performed by the robot, and

wherein the objective function ƒ represents how well the robot performs the task;

determining the control policy for the robot to perform the task by simulating operation of the robot to determine the control policy x* by solving an optimization problem of the form

wherein x∈ custom character

ⁿis a state of the control policy encountered during simulating the operation of the robot, and wherein solving the optimization problem comprises estimating gradients of the objective function ƒ using a finite difference procedure to estimate the gradients in perturbation directions defined by rows of a balanced spinner, the balanced spinner being a two-dimensional matrix such that the absolute value of an inner product between any one of multiple rows of the two-dimensional matrix and any one of multiple columns of the two-dimensional matrix is bounded by a threshold value, and

causing the robot to operate under control of the determined control policy x*.