US 11,854,675 B1
Machine learning extraction of clinical variable values for subjects from clinical record data
Brett Wittmershaus, Brooklyn, NY (US); Guy Amster, Hoboken, NJ (US); Michael Waskom, New York, NY (US); Natalie Roher, New York, NY (US); Nisha Singh, Queens, NY (US); Sharang Phadke, New York, NY (US); and Will Shapiro, New York, NY (US)
Assigned to Flatiron Health, Inc., New York, NY (US)
Filed by Flatiron Health, Inc., New York, NY (US)
Filed on Oct. 11, 2022, as Appl. No. 17/963,998.
Int. Cl. G16H 10/60 (2018.01); G06F 16/35 (2019.01); G06N 20/00 (2019.01)
CPC G16H 10/60 (2018.01) [G06F 16/35 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method of using machine learning to automatically extract values of clinical variables for a plurality of subjects from clinical record data, the method comprising:
using at least one processor to perform:
obtaining clinical record data associated with the plurality of subjects;
generating, using the clinical record, a dataset for storing values of a plurality of clinical variables, the plurality of clinical variables comprising:
a subset of clinical variables designated as hybrid variables that can have their values assigned by machine learning model prediction or by manual extraction; and
a subset of clinical variables designated as non-hybrid variables that cannot have their values assigned by machine learning prediction; and
setting, for each of the plurality of subjects, a value of each of the hybrid variables in the dataset at least in part by:
processing, using a machine learning model trained to predict a value of the hybrid variable, clinical record data associated with the subject to obtain a predicted hybrid variable value and an associated confidence score;
determining, using the confidence score associated with the predicted hybrid variable value, whether to set the value of the hybrid variable for the subject to the predicted hybrid variable value;
in response to determining to set the value of the hybrid variable for the subject to the predicted hybrid variable value:
setting the value of the hybrid variable for the subject to the predicted hybrid variable value in the dataset; and
in response to determining to not set the value of the hybrid variable for the subject to the predicted hybrid variable value:
obtaining input indicating a manually extracted hybrid variable value for the subject; and
setting the value of the hybrid variable for the subject to the manually extracted hybrid variable value in the dataset; and
setting, for each of the plurality of subjects, values of the non-hybrid variables to manually extracted values of the non-hybrid variables without obtaining machine learning predicted values of the non-hybrid variables.