US 12,393,175 B2
Systems and methods for skill learning with multiple critics
David Emukpere, Auvergne-Rhone-Alpes (FR); Bingbing Wu, Auvergne-Rhone-Alpes (FR); and Julien Perez, Auvergne-Rhone-Alpes (FR)
Assigned to Naver Corporation, (KR)
Filed by Naver Corporation, Gyeonggi-do (KR)
Filed on Nov. 16, 2023, as Appl. No. 18/511,829.
Prior Publication US 2025/0164966 A1, May 22, 2025
Int. Cl. G05B 19/4155 (2006.01); B25J 9/16 (2006.01)
CPC G05B 19/4155 (2013.01) [B25J 9/163 (2013.01); G05B 2219/39376 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
identifying a set of critics that pertain to a position-representing space, wherein each of the set of critics corresponds to a different objective function, and wherein the set of critics includes a first critic, a second critic, and a third critic;
accessing an objective function of the first critic that is based on a reach-reward objective configured to be anti-correlated with an extent of a movement from an initial position to a target position within the position-representing space;
accessing an objective function of the second critic that is based on a discovery-component objective configured to be correlated with an extent to which the movement triggers expansion of a bound or volume of the position-representing space; and
accessing an objective function of the third critic that is based on a safety-component objective that is configured to be anticorrelated with an extent to which any of one or more safety constraints are violated during or at a completion of the movement;
identifying, for each critic of the set of critics, a learned value function in the position-representing space that is based on the objective function of the critic;
assigning, to each critic of the set of critics, a weight;
receiving sensor data;
identifying a position within the position-representing space that corresponds to the received sensor data;
updating a policy based on the learned value function for each critic of the set of critics and the weights of the set of critics, wherein the policy corresponds to a movement objective in the position-representing space;
determining, based on the policy, a recommended transition within the position-representing space from the initial position to the target position; and
outputting a representation of the recommended transition within the position-representing space.