US 12,061,673 B1
	Multi-agent planning and autonomy
Sean Soleyman, Calabasas, CA (US); and Deepak Khosla, Camarillo, CA (US)
Assigned to HRL LABORATORIES, LLC, Malibu, CA (US)
Filed by HRL Laboratories, LLC, Malibu, CA (US)
Filed on Feb. 3, 2021, as Appl. No. 17/167,001.
Application 17/167,001 is a continuation in part of application No. 17/064,566, filed on Oct. 6, 2020.
Application 17/064,566 is a continuation in part of application No. 16/792,869, filed on Feb. 17, 2020.
Claims priority of provisional application 63/027,829, filed on May 20, 2020.
Claims priority of provisional application 62/953,008, filed on Dec. 23, 2019.
Claims priority of provisional application 62/814,133, filed on Mar. 5, 2019.
Int. Cl. G06F 18/21 (2023.01); G06N 3/086 (2023.01)

CPC G06F 18/2185 (2023.01) [G06N 3/086 (2013.01)]

18 Claims

1. A system for controlling a plurality of autonomous platforms, the system comprising:

one or more processors and a non-transitory computer-readable medium having executable instructions encoded thereon such that when executed, the one or more processors perform operations of:

performing a training process to produce a trained learning agent in a simulation environment comprising a plurality of controlled platforms and a plurality of target platforms, comprising:

for each simulation time-step until an episode ends:

assigning each controlled platform to one target platform, wherein each target platform produces an observation;

processing, with a learning agent, each observation using a deep learning network;

for each observation, producing, by the learning agent, an action corresponding to each controlled platform until an action has been produced for each controlled platform in the simulation environment;

obtaining a reward value corresponding to the episode; and

executing the trained learning agent to control each autonomous platform comprising a plurality of platform sensors and platform actuators, wherein executing the trained agent comprises:

receiving, by the trained agent, one or more observations from at least one platform sensor;

producing a behavior-level action based on the one or more observations; and

causing one or more platform actuators to perform a physical action based on the behavior-level action.