US 12,346,822 B2
Systems and methods for generating models to identify states and actions
Matthew Millen, Yarrow Point, WA (US); Srinath Sridhar, San Francisco, CA (US); and Thomas Tobin, Tiburon, CA (US)
Assigned to Regie.ai, Inc., San Francisco, CA (US)
Filed by Regie.ai, Inc., San Francisco, CA (US)
Filed on Oct. 24, 2023, as Appl. No. 18/493,484.
Prior Publication US 2025/0131278 A1, Apr. 24, 2025
Int. Cl. G06N 3/092 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/092 (2023.01) [G06N 3/045 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
providing, via a processor, a plurality of inputs from a plurality of sources as input to a first neural network to define a first state associated with (1) an entity and (2) a first node from a plurality of nodes included in a reinforcement learning model;
receiving, at the processor, (1) an indication of a plurality of prospects and (2) an indication of a plurality of actions;
generating, via the processor, a constraint model that includes:
a first set of vertices based on the indication of the plurality of prospects,
a second set of vertices based on the indication of the plurality of actions,
a source vector and a sink vector,
a first set of edges between the first set of vertices and the second set of vertices, each edge from the first set of edges having an edge weight that defines a reward associated with a prospect from the plurality of prospects and an action from the plurality of actions,
a second set of edges between the source vector and the first set of vertices, and
a third set of edges between the second set of vertices and the sink vector;
identifying, via the processor and based on each edge weight associated with the first set of edges, (1) at least a portion of the first set of edges, (2) at least a portion of the second set of edges, and (3) at least a portion of the third set of edges;
determining, via the processor and in response to the identifying, an action constraint associated with the action from the plurality of actions;
providing, via the processor, the first state and the action constraint as input to the reinforcement learning model to produce, based on the reward, (1) a second state associated with a second node from the plurality of nodes and (2) an indication of the action associated with a transition from the first state to the second state;
generating, via the processor and using a second neural network, an implementation of the action based on the first state, the second state, and the indication of the action;
receiving, at the processor and in response to the implementation of the action being transmitted to the entity, a success metric determined based on an outcome of the implementation of the action;
determining, via the processor, a third state associated with the entity based on the success metric; and
retraining, via the processor, the reinforcement learning model based on the success metric.