US 11,676,071 B2
Identifying and ranking anomalous measurements to identify faulty data sources in a multi-source environment
Amit Vaid, Bengaluru (IN); Karthik Gvd, Bangalore (IN); Vijayalakshmi Krishnamurthy, Sunnyvale, CA (US); and Vidya Mani, Bangalore (IN)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Jan. 13, 2021, as Appl. No. 17/147,737.
Claims priority of application No. 202041027680 (IN), filed on Jun. 30, 2020; and application No. 202041028694 (IN), filed on Jul. 6, 2020.
Prior Publication US 2021/0406110 A1, Dec. 30, 2021
Int. Cl. G06F 11/00 (2006.01); G06N 20/00 (2019.01); G06F 11/07 (2006.01); G06F 18/22 (2023.01); G06F 18/25 (2023.01); G06F 18/2135 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 11/0754 (2013.01); G06F 11/0766 (2013.01); G06F 18/2135 (2023.01); G06F 18/22 (2023.01); G06F 18/251 (2023.01)] 22 Claims
OG exemplary drawing
 
1. One or more non-transitory machine-readable media storing instructions which, when executed by one or more processors, cause: obtaining a first data point and a plurality of additional data points, the first data point and the plurality of additional data points each comprising a plurality of measurements from a plurality of sources, the plurality of measurements including at least a first measurement from a first source and a second measurement from a second source; determining that the first data point is an anomalous data point based on a deviation of the first data point from the plurality of additional data points; determining a contribution of each of the plurality of measurements of the first data point, including a contribution of the first measurement and a contribution of the second measurement, to the deviation of the first data point from the plurality of additional data points at least by: performing a principal component analysis of the first data point and the plurality of additional data points to identify a variation of each data point from two or more principal components; determining a first difference between the first measurement of the first data point and a corresponding measurement of a first principal component; and determining a second difference between the second measurement of the first data point and a corresponding measurement of a second principal component; and ranking each measurement of the plurality of measurements based on a contribution of each measurement of the plurality of measurements to the deviation of the anomalous data point from the plurality of additional data points, wherein ranking each measurement of the plurality of measurements of the first data point comprises: responsive to determining that the first measurement of the plurality of measurements of the first data point has a higher contribution to the deviation of the first data point than the second measurement of the plurality of measurements of the first data point: ranking the first measurement higher than the second measurement in a ranking of the plurality of measurements of the first data point.