CPC B25J 9/163 (2013.01) [B25J 9/1661 (2013.01); B25J 9/1671 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A method for determining a control policy for a robot, the method comprising:
receiving a specification in the form of an objective function ƒ of a task to be performed by the robot,
wherein ƒ is a scalar-valued objective function of a control policy, and the control policy determines, for each state s of the robot, an action u to be performed by the robot, and
wherein the objective function ƒ represents how well the robot performs the task;
determining the control policy for the robot to perform the task by simulating operation of the robot to determine the control policy x* by solving an optimization problem of the form
wherein x∈n is a state of the control policy encountered during simulating the operation of the robot, and wherein solving the optimization problem comprises estimating gradients of the objective function ƒ using a finite difference procedure to estimate the gradients in perturbation directions defined by rows of a balanced spinner, the balanced spinner being a two-dimensional matrix such that the absolute value of an inner product between any one of multiple rows of the two-dimensional matrix and any one of multiple columns of the two-dimensional matrix is bounded by a threshold value, and
causing the robot to operate under control of the determined control policy x*.
|