US 12,109,701 B2
Guided uncertainty-aware policy optimization: combining model-free and model-based strategies for sample-efficient learning
Jonathan Tremblay, Redmond, WA (US); Dieter Fox, Seattle, WA (US); Michelle Lee, Seattle, WA (US); Carlos Florensa, Seattle, WA (US); Nathan Donald Ratliff, Seattle, WA (US); Animesh Garg, Berkeley, CA (US); and Fabio Tozeto Ramos, Seattle, WA (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by NVIDIA Corporation, Santa Clara, CA (US)
Filed on Feb. 3, 2020, as Appl. No. 16/780,465.
Claims priority of provisional application 62/938,101, filed on Nov. 20, 2019.
Prior Publication US 2021/0146531 A1, May 20, 2021
Int. Cl. B25J 9/16 (2006.01); G05B 13/02 (2006.01); G05B 13/04 (2006.01); G06N 3/08 (2023.01); G06N 5/046 (2023.01); G06N 20/00 (2019.01)
CPC B25J 9/163 (2013.01) [B25J 9/1661 (2013.01); B25J 9/1664 (2013.01); B25J 9/1697 (2013.01); G05B 13/027 (2013.01); G05B 13/04 (2013.01); G06N 3/08 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01)] 30 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
dividing at least a portion of a physical model created based at least in part on information from a perception system into a plurality of regions;
generating estimates of uncertainty for the plurality of regions based at least in part on at least one uncertainty estimation provided by the perception system;
using the physical model to control a robot in any of the plurality of regions associated with any of the estimates of uncertainty that indicate the robot is unlikely to interact with its environment; and
using at least one reinforcement learning process instead of using the physical model to control the robot in any of the plurality of regions associated with any of the estimates of uncertainty that indicate the robot is likely to interact with its environment.