US 12,271,787 B2
	Robust, scalable and generalizable machine learning paradigm for multi-agent applications
Deepak Khosla, Camarillo, CA (US); and Sean Soleyman, Calabasas, CA (US)
Assigned to HRL LABORATORIES, LLC, Malibu, CA (US)
Filed by HRL Laboratories, LLC, Malibu, CA (US)
Filed on Feb. 17, 2020, as Appl. No. 16/792,869.
Claims priority of provisional application 62/814,133, filed on Mar. 5, 2019.
Prior Publication US 2020/0285995 A1, Sep. 10, 2020
Int. Cl. G06N 20/00 (2019.01)

CPC G06N 20/00 (2019.01)

18 Claims

1. A computer implemented method for multi-agent applications, the method comprising:

training a plurality of learning agents to maximize a reward function returned by an environment, the learning agents including both tactical agents and strategic agents, and wherein each learning agent is trained in an initial state space, the initial state space being a set of all possible conditions that may exist in the environment at a start of a training episode, and wherein the initial state space is sequentially expanded through a series of increasingly larger environments after at least two of the learning agents have fitness levels within a predetermined threshold for each expansion, wherein the initial state space is sequentially expanded through a series of increasingly larger environments by sequentially increasing the field-of-view of each learning agent;

wherein training the plurality of learning agents includes maintaining a fitness level for each learning agent during training, where the fitness level represents an average of a net reward obtained by the learning agent from each episode of training, and selecting one or more learning agents for additional training, based on their fitness with respect to a collective fitness of the learning agents, such that those learning agents that show improvement with the fitness level are provided the additional training, while those learning agents that show no improvement with the fitness level after a predetermined number of episodes are frozen at their current learned state and used as opponents for the additional training;

taking, by one or more strategic agents, an observation from the environment and select one or more of the tactical agents to produce an action that is used to control a platform's actuators or simulated movements in the environment to complete a task; and

producing, one or more tactical agents, the action corresponding to a learned low-level behavior to control the platform's actuators or simulated movements in the environment to complete the task.