CPC G16H 10/20 (2018.01) [G06N 5/04 (2013.01); G06N 20/00 (2019.01); G16H 50/20 (2018.01)] | 9 Claims |
1. A computer-implemented method comprising:
receiving data during time intervals from a plurality of mobile health sensors, including sensors within a single device or multiple devices;
receiving a diary that records subject state determinations;
integrating the data from the mobile health sensors and the subject state determinations from the diary;
cleaning the integrated data;
determining which variables optimally predict the subject state;
partitioning said integrated and cleaned data using a combination of ranges of values for the variables that optimally predict the subject state;
generating a training set of data comprising a portion of the integrated, cleaned, and partitioned data;
training a predictive model using the training set and a machine learning algorithm; and
generating the predictive model of the state of the subject of the clinical trial based on the training set; and
using the predictive model to determine subject state for data without the diary that records said subject state determination,
wherein the subject state comprises a digital bio-marker for a disease condition, and wherein to determine the ranges of the values for the variables that optimally predict the subject state, the method further comprises:
(a) for a first variable X1, using a range of data, X1>A, where A is a defined value, to classify the data according to a first state Y1 and a second state Y2;
(b) computing a ratio between Y1 and Y2, or between Y1 and Y1+Y2, for data where X1 is not greater than A and for data where X1 is greater than A, respectively; and
iteratively repeating (a) and (b) for selected values of A to maximize said ratio between Y1 and Y2, or between Y1 and Y1+Y2.
|