| CPC G16H 20/00 (2018.01) [G06N 3/092 (2023.01)] | 8 Claims |

|
1. A computer-implemented method for determining a treatment recommendation using a multi-armed bandit (MAB) model, the method comprising:
receiving, by one or more processors, first patient information;
determining, by the one or more processors and using the MAB model, the treatment recommendation based on the first patient information, wherein the MAB model is trained based on a MAB model treatment recommendation determined by the MAB model using second patient information and a clinical treatment recommendation determined according to clinical guidelines based on the second patient information; and
providing, by the one or more processors, the treatment recommendation, wherein the MAB model is trained by:
receiving training data including third patient information, selected treatment information, and selected treatment outcome information;
determining a reward probability distribution of treatment options of the MAB model using the training data and an artificial intelligence (AI) model;
receiving the second patient information;
determining the MAB model treatment recommendation using the MAB model configured with the reward probability distribution and a Thompson sampling technique, based on the second patient information;
determining the clinical treatment recommendation based on the second patient information;
determining a confidence score of the MAB model treatment recommendation;
determining a hybrid treatment recommendation based on the MAB model treatment recommendation, the clinical treatment recommendation, and the confidence score; and
training the MAB model based on the hybrid treatment recommendation.
|