US 11,657,251 B2
System and method for multi-agent reinforcement learning with periodic parameter sharing
Alireza Nakhaei Sarvedani, Sunnyvale, CA (US); Kikuo Fujimura, Palo Alto, CA (US); and Safa Cicek, Los Angeles, CA (US)
Assigned to HONDA MOTOR CO., LTD., Tokyo (JP)
Filed by Honda Motor Co., Ltd., Tokyo (JP)
Filed on Nov. 11, 2019, as Appl. No. 16/680,395.
Claims priority of provisional application 62/759,957, filed on Nov. 12, 2018.
Prior Publication US 2020/0151564 A1, May 14, 2020
Int. Cl. G06V 10/764 (2022.01); G06N 3/08 (2023.01); G06N 3/04 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/0454 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for multi-agent reinforcement learning with periodic parameter sharing, comprising:
inputting at least one occupancy grid to a convolutional neural network (CNN) and at least one vehicle dynamic parameter into a first fully connected layer, wherein the at least one occupancy grid and the at least one vehicle dynamic parameter are associated with at least one of: an ego agent and a target agent;
concatenating outputs of the CNN and the first fully connected layer, wherein the concatenated outputs of the first fully connected layer and the CNN are inputted into a long short-term memory unit (LSTM), wherein the LSTM is configured to output temporal data that includes classification and prediction based data that is associated with a time series that pertains to a multi-agent environment and at least one of: the ego agent and the target agent;
providing Q value estimates for agent actions based on processing of the concatenated outputs and choosing at least one autonomous action to be executed by at least one of: the ego agent and the target agent; and
processing a multi-agent policy that accounts for operation of the ego agent and the target agent with respect to one another within the multi-agent environment based on the at least one autonomous action to be executed by at least one of: the ego agent and the target agent.