US 11,657,266 B2
Cooperative multi-goal, multi-agent, multi-stage reinforcement learning
Jiachen Yang, San Jose, CA (US); Alireza Nakhaei Sarvedani, Sunnyvale, CA (US); David Francis Isele, Sunnyvale, CA (US); and Kikuo Fujimura, Palo Alto, CA (US)
Assigned to HONDA MOTOR CO., LTD., Tokyo (JP)
Filed by Honda Motor Co., Ltd., Tokyo (JP)
Filed on Nov. 16, 2018, as Appl. No. 16/193,291.
Prior Publication US 2020/0160168 A1, May 21, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G05D 1/00 (2006.01); H04W 4/44 (2018.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01)
CPC G06N 3/08 (2013.01) [G05D 1/0088 (2013.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01); H04W 4/44 (2018.02)] 20 Claims
OG exemplary drawing
 
1. A method for cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning, comprising:
training a first agent based on a first policy gradient and training a first critic based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with a first agent neural network and the first critic is associated with a first critic neural network;
training a number of N agents based on the first policy gradient and training a second policy gradient and a second critic based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent;
generating a cooperative multi-goal, multi-agent, multi-stage network policy based on the first agent neural network and the second agent neural network; and
operating an autonomous vehicle in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy.