US 12,481,921 B2
	Uncertainty-aware continuous control system based on reinforcement learning
Takuya Kanazawa, Santa Clara, CA (US); Haiyan Wang, Fremont, CA (US); and Chetan Gupta, San Mateo, CA (US)
Assigned to HITACHI, LTD., Tokyo (JP)
Filed by Hitachi, Ltd., Tokyo (JP)
Filed on Jul. 11, 2022, as Appl. No. 17/862,147.
Prior Publication US 2024/0013090 A1, Jan. 11, 2024
Int. Cl. G06F 16/00 (2019.01); G06N 20/00 (2019.01)

CPC G06N 20/00 (2019.01)

14 Claims

1. A method for reinforcement learning (RL) of continuous actions for controlling physical systems, comprising:

receiving a state as input to at least one actor network to predict candidate actions based on the state, wherein the state is a current observation;

outputting the candidate actions from the at least one actor network;

receiving the state and the candidate actions as inputs to a plurality of distributional critic networks trained with independent random initializations of network parameters in parallel through interactions with an environment, wherein there is no direct interaction between different critic networks, wherein the plurality of distributional critic networks calculates quantiles of a return distribution associated with the candidate actions in relation to the state, wherein the plurality of distributional critic networks converge to same values for previously visited state-action pairs while disagreeing on novel state-action pairs;

outputting the quantiles from the plurality of distributional critic networks; and

selecting an output action based on the candidate actions and the quantiles, wherein the selecting comprises:

executing high-epistemic uncertainty actions in early training stages to accelerate exploration of optimal control parameters for a physical system; and

transitioning to low-uncertainty actions in later stages to promote convergence to optimal control policies to control the physical system.