US 12,294,582 B2
	Communication network resilience based on distributed multi-agent q-learning algorithm
Lan Kim Nguyen, Mission Viejo, CA (US); Elsa Newman Schaefer, Fairfax, VA (US); and Robert Arthur Hughes, Jr., Bayfield, CO (US)
Assigned to KBR WYLE SERVICES, LLC, Houston, TX (US)
Filed by KBR Wyle Services, LLC, Houston, TX (US)
Filed on Jun. 1, 2023, as Appl. No. 18/327,347.
Claims priority of provisional application 63/354,410, filed on Jun. 22, 2022.
Claims priority of provisional application 63/348,816, filed on Jun. 3, 2022.
Prior Publication US 2023/0396623 A1, Dec. 7, 2023
Int. Cl. H04L 41/046 (2022.01); H04L 9/40 (2022.01); H04L 41/16 (2022.01)

CPC H04L 63/101 (2013.01) [H04L 41/046 (2013.01); H04L 41/16 (2013.01); H04L 63/108 (2013.01)]

20 Claims

1. A method for strengthening communication network resilience, comprising, at a source agent of the communication network:

accessing an access list comprising communication relay agents available to the source agent;

accessing a Q-table, from among a plurality of Q-tables, that corresponds to the communication relay agents available to the source agent, wherein each entry in the Q-table indicates a predicted reward for transitioning from a first relay agent of the communication relay agents to a second relay agent of the communication relay agents at a specified time slot;

transitioning from communicating via a current communication relay agent to communicating via a new communication relay agent at a time slot, wherein the new communication relay agent is determined based on a set of entries in the Q-table, wherein the set of entries comprises entries in the Q-table corresponding to transitioning from the current communication relay agent to each of the communication relay agents at the time slot;

receiving data indicative of an actual reward for transitioning to the new communication relay agent; and

updating the entry in the Q-table corresponding to the transition from the current communication relay agent to the new communication relay agent at the time slot based on the received data indicative of the actual reward.