US 12,265,924 B1
	Robust multi-agent reinforcement learning
Tao Sun, Bellevue, WA (US); Yunzhe Tao, Bothell, WA (US); Sahika Genc, Fall City, WA (US); Sunil Mallya Kasaragod, San Francisco, CA (US); and Kaiqing Zhang, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 22, 2020, as Appl. No. 16/908,486.
Int. Cl. G06N 5/04 (2023.01); G06N 5/043 (2023.01); G06N 20/00 (2019.01)

CPC G06N 5/043 (2013.01) [G06N 20/00 (2019.01)]

20 Claims

1. A computer-implemented method comprising:

for each agent of a multi-agent system, sampling an action with a policy of the agent based on a first state, wherein at least one agent of the multi-agent system is an implicit agent that plays against other agents of the multi-agent system by playing to minimize both an expected immediate reward for the implicit agent and an expected future reward for the implicit agent;

executing a joint action with the agents and observing a second state;

receiving an uncertain reward at each agent in response to the joint action;

storing the joint action, uncertain reward, first state, and second state in a replay buffer accessible to each agent;

for each agent, until a terminal state is reached:

sampling a random batch of samples from the replay buffer,

updating a critic of the agent by minimizing loss between a predicted version of an action-value function and an uncertain version of the action-value function, and

updating an actor of the agent, the updating to factor in the uncertain version of the action-value function.