CPC G06F 9/3001 (2013.01) [G06F 9/545 (2013.01)] | 18 Claims |
1. A computer-implemented method of detecting data errors, comprising:
receiving a new value as user input for a data field;
generating a first histogram-based approximation of a first kernel density estimate generated based on valid data associated with the data field and a second histogram-based approximation of a second kernel density estimate generated based on invalid data associated with the data field;
determining a first likelihood that the new value is valid, wherein the first likelihood is equal to a first probability density of a first bin of the first histogram-based approximation that includes a log ratio of the new value to a mean value associated with the data field;
determining a second likelihood that the new value is invalid, wherein the second likelihood is equal to a second probability density of a second bin of the second histogram-based approximation that includes the log ratio of the new value to the mean value associated with the data field;
computing a likelihood ratio test statistic based on a ratio of the first likelihood that the new value is valid to the second likelihood that the new value is invalid;
classifying the new value as valid or invalid based on comparing the likelihood ratio test statistic to a likelihood ratio test threshold; and
when the new value is classified as invalid, taking one or more actions to correct the new value.
|