US 12,443,800 B2
Generation of synthetic question-answer pairs using a document classifier and classification explainer
Joel David Stremmel, Iowa City, IA (US); Eran Halperin, Santa Monica, CA (US); Sanjit S Batra, Redwood City, CA (US); Ardavan Saeedi, Jersey City, NJ (US); and Hamid Reza Hassanzadeh, Suwanee, GA (US)
Assigned to UnitedHealth Group Incorporated, Minnetonka, MN (US)
Filed by UnitedHealth Group Incorporated, Minnetonka, MN (US)
Filed on May 10, 2023, as Appl. No. 18/315,112.
Claims priority of provisional application 63/482,615, filed on Feb. 1, 2023.
Prior Publication US 2024/0256782 A1, Aug. 1, 2024
Int. Cl. G06F 40/30 (2020.01); G06N 5/022 (2023.01)
CPC G06F 40/30 (2020.01) [G06N 5/022 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
generating, by one or more processors, one or more predicted label indicators associated with a textual dataset based on inputting the textual dataset to a dataset classification predicting machine learning model;
generating, by the one or more processors, one or more prediction score indicators associated with one or more prediction explanation indicators based on inputting the textual dataset and the one or more predicted label indicators to a classification explanation predicting machine learning model;
generating, by the one or more processors, one or more structured label-explanation datasets based on the one or more predicted label indicators, the one or more prediction explanation indicators, and the one or more prediction score indicators;
generating, by the one or more processors, one or more synthetic question-answer (QA) training datasets based on the one or more structured label-explanation datasets and a prediction score threshold;
generating, by the one or more processors, a prediction output using one or more QA machine learning models that are trained based on the one or more synthetic QA training datasets; and
initiating, by the one or more processors, the performance of one or more prediction-based operations based on the prediction output.