CPC G06F 16/215 (2019.01) | 24 Claims |
1. A computerized method comprising:
computing, by a drift detection subsystem using machine-learning models, a first probability distribution associated with content of a first data sample of a data stream and a second data sample of the data stream, the second data sample operating as a reference;
computing, by the drift detection subsystem using machine-learning models, a second probability distribution associated with content of the second data sample;
conducting analytics to compute a difference between content of the first probability distribution that is based on a first data point of the first data sample and content of the second probability distribution that is based on a first data point of the second data sample, wherein each data point of the first data sample, including the first data point, corresponds to machine data directed to a purpose separate and distinct from other data points of the first data sample;
determining that categorical drift is occurring for the data stream in response to the difference failing to satisfy an error threshold; and
in response to determining (a) that categorical drift is occurring and (b) that a prescribed number of data points associated with one or more data samples of the data stream identify potential categorical drift conditions, performing, by the drift detection subsystem (i) recomputing the error threshold with updated training data samples and (ii) issuing an alert message to an administrator indicating categorical drift is occurring.
|