US 12,079,312 B2
Machine learning outlier detection using weighted histogram-based outlier scoring (W-HBOS)
Yuting Jia, Redmond, WA (US); Jayaram N. M. Nanduri, Issaquah, WA (US); Kiyoung Yang, Sammamish, WA (US); and Yini Zhang, Bellevue, WA (US)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Dec. 14, 2020, as Appl. No. 17/121,475.
Claims priority of provisional application 63/085,530, filed on Sep. 30, 2020.
Prior Publication US 2022/0101069 A1, Mar. 31, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 17/18 (2006.01); G06F 18/2135 (2023.01); G06F 18/2433 (2023.01); G06Q 10/0633 (2023.01)
CPC G06F 18/2433 (2023.01) [G06F 17/18 (2013.01); G06F 18/2135 (2023.01); G06Q 10/0633 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A computer-implemented, outlier detection method, comprising:
identifying a selected set of features from an initial set of features extracted from a data set, where the selected set of features is identified by reducing dependency between features of the initial set of features via an intermediate feature selection;
generating a training data set by transforming the selected set of features, where the transformation further reduces feature dependency between the selected set of features;
training an outlier identification model, comprising a Weighted Histogram-based Outlier Scoring (W-HBOS) model, on the training data set via unsupervised training;
selecting one of a higher value outlier subset and a lower value outlier subset included in an outlier set obtained by applying the outlier identification model to real-world data,
wherein the higher value outlier subset and the lower value outlier subset are identified based on comparison of outliers in the outlier set with a median value of the outlier set; and
executing one or more automated tasks using entities from the real-world data identified based on the subset of outliers.