US 12,282,337 B2
Dual agent reinforcement learning based system for autonomous operation of aircraft
Prabhat Kumar, Hajip (IN)
Assigned to The Boeing Company, Arlington, VA (US)
Filed by The Boeing Company, Chicago, IL (US)
Filed on Apr. 14, 2022, as Appl. No. 17/721,041.
Claims priority of application No. 202111032985 (IN), filed on Jul. 22, 2021.
Prior Publication US 2023/0025154 A1, Jan. 26, 2023
Int. Cl. G05D 1/00 (2024.01); B64D 43/00 (2006.01)
CPC G05D 1/085 (2013.01) [B64D 43/00 (2013.01); G05D 1/0088 (2013.01); G05D 1/0623 (2013.01); G05D 1/101 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
an artificial neural network;
a cost agent configured to calculate a highest cost reward Q value based on state-action vectors associated with an aircraft, the highest cost reward Q value is associated with maximum fuel efficiency and aircraft performance, wherein the cost agent calculates a cost state-action vector based on the highest cost reward Q value;
a safe agent configured to receive, as an input, a message from the cost agent and calculate a highest safe reward Q value based on the state-action vectors associated with the aircraft and based on the cost state-action vector, the highest safe reward Q value is associated with landing the aircraft at a predetermined destination or a calculated emergency destination in response to an emergency, wherein the safe agent calculates a safe state-action vector based on the highest safe reward Q value;
wherein the state-action vectors include state data associated with the aircraft and action data associated with the aircraft;
wherein the artificial neural network determines an agent action based on the safe state-action vector and the cost state-action vector; and
a user output device positioned within a cockpit of the aircraft and configured to provide an indication of an action to a user while the user is controlling the aircraft, wherein the action corresponds to the agent action as determined by the artificial neural network, and wherein the user output is configured to provide a warning to the user in response to the user not performing the action.