US 12,244,629 B2
	Systems and methods for applying reinforcement learning to cybersecurity graphs
James Korge, Brooklyn, NY (US); Damion Irving, Brooklyn, NY (US); Jeffrey L. Thomas, Columbus, OH (US); and Donald Bathurst, Denver, CO (US)
Assigned to Reveald Holdings, Inc., New York, NY (US)
Filed by Reveald Holdings, Inc., New York, NY (US)
Filed on Jul. 29, 2022, as Appl. No. 17/816,009.
Claims priority of provisional application 63/227,963, filed on Jul. 30, 2021.
Prior Publication US 2023/0034303 A1, Feb. 2, 2023
Int. Cl. H04L 29/08 (2006.01); G06F 16/25 (2019.01); G06F 16/901 (2019.01); G06F 16/953 (2019.01); H04L 9/40 (2022.01); H04L 12/26 (2006.01); H04L 12/58 (2006.01)

CPC H04L 63/1433 (2013.01)

15 Claims

1. A cybersecurity method for determining exploits, comprising the steps of:

receiving a graph representing a digital network of a plurality of nodes forming a federated learning network, the graph including at least one vulnerability for each of the plurality of nodes;

receiving, for the plurality of nodes, a plurality of embeddings based on the graph, wherein the plurality of embeddings include a vector of real numbers representing the plurality of notes in the graph;

assigning an agent an initial node from the plurality of nodes;

querying the graph to obtain a plurality of accessible nodes and at least one vulnerability for the accessible nodes;

determining a transition for the agent to take from the initial node to a next accessible node from the plurality of accessible nodes;

computing using a neural network, a reward for moving to the next accessible node;

assigning the agent a new state corresponding to the next accessible node;

collecting, by a collected experience database, a history of node assignments of the agent, a plurality of connections taken by the agent, and a plurality of rewards the agent received for transitioning across the plurality of connections;

updating a plurality of parameters of a neural network using the data collected by the collected experience database, wherein the information collected by each of a plurality of agents is used to further update the plurality of parameters of the neural network while not sharing graph data contributed by a plurality of graphs; and

determining, by the agent, what action from a plurality of available actions to take next using the neural network.