| CPC G06N 3/092 (2023.01) [G06N 3/045 (2023.01)] | 20 Claims |

|
1. A method comprising:
providing, via a processor, a plurality of inputs from a plurality of sources as input to a first neural network to define a first state associated with (1) an entity and (2) a first node from a plurality of nodes included in a reinforcement learning model;
receiving, at the processor, (1) an indication of a plurality of prospects and (2) an indication of a plurality of actions;
generating, via the processor, a constraint model that includes:
a first set of vertices based on the indication of the plurality of prospects,
a second set of vertices based on the indication of the plurality of actions,
a source vector and a sink vector,
a first set of edges between the first set of vertices and the second set of vertices, each edge from the first set of edges having an edge weight that defines a reward associated with a prospect from the plurality of prospects and an action from the plurality of actions,
a second set of edges between the source vector and the first set of vertices, and
a third set of edges between the second set of vertices and the sink vector;
identifying, via the processor and based on each edge weight associated with the first set of edges, (1) at least a portion of the first set of edges, (2) at least a portion of the second set of edges, and (3) at least a portion of the third set of edges;
determining, via the processor and in response to the identifying, an action constraint associated with the action from the plurality of actions;
providing, via the processor, the first state and the action constraint as input to the reinforcement learning model to produce, based on the reward, (1) a second state associated with a second node from the plurality of nodes and (2) an indication of the action associated with a transition from the first state to the second state;
generating, via the processor and using a second neural network, an implementation of the action based on the first state, the second state, and the indication of the action;
receiving, at the processor and in response to the implementation of the action being transmitted to the entity, a success metric determined based on an outcome of the implementation of the action;
determining, via the processor, a third state associated with the entity based on the success metric; and
retraining, via the processor, the reinforcement learning model based on the success metric.
|