CPC G16H 10/60 (2018.01) [G06F 16/35 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A method of using machine learning to automatically extract values of clinical variables for a plurality of subjects from clinical record data, the method comprising:
using at least one processor to perform:
obtaining clinical record data associated with the plurality of subjects;
generating, using the clinical record, a dataset for storing values of a plurality of clinical variables, the plurality of clinical variables comprising:
a subset of clinical variables designated as hybrid variables that can have their values assigned by machine learning model prediction or by manual extraction; and
a subset of clinical variables designated as non-hybrid variables that cannot have their values assigned by machine learning prediction; and
setting, for each of the plurality of subjects, a value of each of the hybrid variables in the dataset at least in part by:
processing, using a machine learning model trained to predict a value of the hybrid variable, clinical record data associated with the subject to obtain a predicted hybrid variable value and an associated confidence score;
determining, using the confidence score associated with the predicted hybrid variable value, whether to set the value of the hybrid variable for the subject to the predicted hybrid variable value;
in response to determining to set the value of the hybrid variable for the subject to the predicted hybrid variable value:
setting the value of the hybrid variable for the subject to the predicted hybrid variable value in the dataset; and
in response to determining to not set the value of the hybrid variable for the subject to the predicted hybrid variable value:
obtaining input indicating a manually extracted hybrid variable value for the subject; and
setting the value of the hybrid variable for the subject to the manually extracted hybrid variable value in the dataset; and
setting, for each of the plurality of subjects, values of the non-hybrid variables to manually extracted values of the non-hybrid variables without obtaining machine learning predicted values of the non-hybrid variables.
|