US 12,394,508 B2
Systems and methods for training multi-armed bandit models
Mohsen Afrasiabi, Madison, WI (US); Tanzeem Choudhury, New York, NY (US); Cecilia M. Livesey, Merion Station, PA (US); Jared Dustin Martin, Minneapolis, MN (US); Herk Anthony Confer, San Francisco, CA (US); Daniel Joseph Mulcahy, Evanston, IL (US); and Rony Krell, Brooklyn, NY (US)
Assigned to UnitedHealth Group Incorporated, Minnetonka, MN (US)
Filed by UnitedHealth Group Incorporated, Minnetonka, MN (US)
Filed on Apr. 18, 2023, as Appl. No. 18/302,185.
Claims priority of provisional application 63/381,392, filed on Oct. 28, 2022.
Prior Publication US 2024/0145057 A1, May 2, 2024
Int. Cl. G16H 20/00 (2018.01); G06N 3/092 (2023.01)
CPC G16H 20/00 (2018.01) [G06N 3/092 (2023.01)] 8 Claims
OG exemplary drawing
 
1. A computer-implemented method for determining a treatment recommendation using a multi-armed bandit (MAB) model, the method comprising:
receiving, by one or more processors, first patient information;
determining, by the one or more processors and using the MAB model, the treatment recommendation based on the first patient information, wherein the MAB model is trained based on a MAB model treatment recommendation determined by the MAB model using second patient information and a clinical treatment recommendation determined according to clinical guidelines based on the second patient information; and
providing, by the one or more processors, the treatment recommendation, wherein the MAB model is trained by:
receiving training data including third patient information, selected treatment information, and selected treatment outcome information;
determining a reward probability distribution of treatment options of the MAB model using the training data and an artificial intelligence (AI) model;
receiving the second patient information;
determining the MAB model treatment recommendation using the MAB model configured with the reward probability distribution and a Thompson sampling technique, based on the second patient information;
determining the clinical treatment recommendation based on the second patient information;
determining a confidence score of the MAB model treatment recommendation;
determining a hybrid treatment recommendation based on the MAB model treatment recommendation, the clinical treatment recommendation, and the confidence score; and
training the MAB model based on the hybrid treatment recommendation.