US 12,443,878 B2
	Reinforcement learning machine learning models for intervention recommendation
V Kishore Ayyadevara, Hyderabad (IN); Rohan Khilnani, Jaipur (IN); Swaroop S. Shekar, Mysore (IN); Raghav Bali, Delhi (IN); Joseph C. Cremaldi, Memphis, TN (US); Fritz T. Wilhelm, Irvine, CA (US); and Vinod Burugupalli, Tampa, FL (US)
Assigned to Optum, Inc., Minnetonka, MN (US)
Filed by Optum, Inc., Minnetonka, MN (US)
Filed on Feb. 10, 2022, as Appl. No. 17/650,573.
Prior Publication US 2023/0252338 A1, Aug. 10, 2023
Int. Cl. G06N 20/00 (2019.01)

CPC G06N 20/00 (2019.01)

20 Claims

1. A computer-implemented method for determining an optimal intervention routine for a plurality of defined timesteps, the computer-implemented method comprising:

identifying, using one or more processors, a group of input events, wherein:

each input event of the group of input events is associated with an event category of a plurality of event categories,

each input event of the group of input events is associated with a defined timestep of the plurality of defined timesteps, and

each input event of the group of input events is associated with an event score set of a plurality of event score sets that is determined based at least in part on one or more input event features associated with the input event and using an event scoring machine learning model;

for each category-timestep pair of a plurality of category-timestep pairs that is associated with a particular event category of the plurality of event categories and a particular defined timestep of the plurality of defined timesteps, generating, using the one or more processors, a temporal event category score based at least in part on event score sets of the plurality of event score sets for a subset of the group of input events that are associated with the particular event category and the particular defined timestep;

generating, using the one or more processors, the optimal intervention routine using an intervention recommendation reinforcement learning machine learning model, wherein:

the optimal intervention routine is selected from a plurality of candidate intervention routines by the intervention recommendation reinforcement learning machine learning model via maximizing a candidate intervention routine reward measure and minimizing a candidate intervention routine loss measure,

each candidate intervention routine of the plurality of candidate intervention routines assigns a unique m-sized subset of the plurality of event categories to each defined timestep of the plurality of defined timesteps,

each candidate intervention routine of the plurality of candidate intervention routines is associated with a per-timestep reward measure for each defined timestep of the plurality of defined timesteps that is generated based at least in part on each temporal event category score that is associated with a first category-timestep pair of the plurality of category-timestep pairs that is associated with: (i) one of the plurality of event categories that are in the unique m-sized subset for the defined timestep, and (ii) the defined timestep,

each candidate intervention routine of the plurality of candidate intervention routines is associated with a per-timestep loss measure for each defined timestep of the plurality of defined timesteps that is generated based at least in part on each temporal event category score that is associated with a second category-timestep pair that is associated with: (i) one of the plurality of event categories that are not in the unique m-sized subset for the defined timestep, and (ii) the defined timestep,

the candidate intervention routine reward measure for a particular candidate intervention routine is generated based at least in part on each per-timestep reward measure for the particular candidate intervention routine, and

the candidate intervention routine loss measure for a particular candidate intervention routine is generated based at least in part on each per-timestep loss measure for the particular candidate intervention routine; and

performing, using the one or more processors, one or more prediction-based actions based at least in part on the optimal intervention routine.