| CPC G06Q 20/4016 (2013.01) | 19 Claims |

|
1. A method, comprising:
receiving, by a processor, a set of data elements, wherein the set of data elements includes a stream of events;
for each feature of a set of features, determining, by the processor, a corresponding reference distribution of the respective feature using the set of data elements, wherein the corresponding reference distribution characterizes a distribution of training data in a reference time period of the training data;
updating a histogram representing the corresponding reference distribution, wherein the update is constant in both time and memory with respect to a number of events in the stream of events contributing to the corresponding reference distribution;
for each feature of the set of features, determining, by the processor, one or more corresponding subset distributions for one or more subsets sampled from the set of data elements;
for each feature of the set of features, comparing, by the processor, the corresponding reference distribution with each of the one or more corresponding subset distributions to determine a corresponding distribution of divergences including by computing a divergence measure for each comparison of the corresponding reference distribution with the one or more corresponding subset distributions, wherein the divergence measure indicates a degree of difference between the corresponding reference distribution with the one or more corresponding subset distributions;
optimizing a memory usage and a computational cost associated with a retraining of a machine learning model including by determining an optimal time for the retraining of the machine learning model based on the degree of difference, wherein one or more features of the set of features are utilized by the machine learning model for predictive tasks; and
providing, by the processor, at least the determined distributions of divergences for the set of features associated with detection of unusual transactions for use in automated data analysis.
|