US 12,330,303 B2
	Online augmentation of learned grasping
Ethan K. Gordon, Seattle, WA (US); and Rana Soltani Zarrin, Los Gatos, CA (US)
Assigned to Honda Motor Co., Ltd., Tokyo (JP)
Filed by Honda Motor Co., Ltd., Tokyo (JP)
Filed on Sep. 8, 2022, as Appl. No. 17/940,267.
Claims priority of provisional application 63/333,772, filed on Apr. 22, 2022.
Prior Publication US 2023/0339107 A1, Oct. 26, 2023
Int. Cl. B25J 9/16 (2006.01); B25J 13/00 (2006.01)

CPC B25J 9/163 (2013.01) [B25J 13/006 (2013.01)]

20 Claims

1. A system for online augmentation for learned grasping, comprising:

a processor; and

a memory storing instructions that when executed by the processor cause the processor to:

identify an action from a discrete action space for an environment of an agent, wherein the discrete action space includes a first set of grasps; wherein the agent is able to grasp an object in the environment, and wherein the action is a grasp that is defined as at least one contact point pair having an agent contact point associated with the agent and an object contact point associated with the object;

identify a second set of grasps of the agent based on a transition model and at least one contact parameter, wherein the agent is able to grasp the object in the environment, wherein a grasp is defined as at least one contact point pair having an agent contact point associated with the agent and an object contact point associated with the object, and wherein the at least one contact parameter defines allowed states of contact for the agent;

apply a reward function to evaluate each grasp of the second set of grasps based on a set of contact forces within a friction cone that minimizes a difference between an actual net wrench on the object and a predetermined net wrench, wherein the reward function is iteratively optimized online using a lookahead tree having a plurality of nodes, each node of the plurality of nodes predicting one or more next grasp states amongst the second set of grasps;

select a next grasp from the second set of grasps based on a reward value of the reward function associated with the plurality of nodes; and

cause the agent to execute the next grasp.