US 11,789,914 B2
	Data correctness optimization
Swapnasarit Sahu, Berlin (DE); Ernest Kirubakaran Selvaraj, Berlin (DE); Tushar Agarwal, Berlin (DE); Projjol Banerjea, Berlin (DE); Daniel Heer, Berlin (DE); and Sathish Kumar K S, Berlin (DE)
Assigned to zeotap GmbH, Berlin (DE)
Filed by zeotap GmbH, Berlin (DE)
Filed on May 20, 2020, as Appl. No. 16/878,713.
Prior Publication US 2021/0365420 A1, Nov. 25, 2021
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 17/18 (2006.01)

CPC G06F 16/215 (2019.01) [G06F 17/18 (2013.01)]

18 Claims

1. A computer implemented method for improving data correctness of measured values using a ground truth dataset, comprising:

using a transceiver and at least one processor coupled to the transceiver for:

receiving a plurality of datasets from different data sources, the datasets comprising a plurality of data elements, wherein each data element includes an identifier and at least one measured value associated with the identifier;

determining data correctness values for the measured values using at least one of a panel and calibration measurements for checking correctness of the measured values, wherein a data correctness value is associated with a probability that a measured value is correct;

adding a data element including a single data correctness value for each measured value for each identifier with which a respective measured value is associated to the ground truth dataset, wherein the single data correctness value being based on the determined data correctness values for the measured values, whereby the data correctness values in the ground truth dataset define probability distributions of data correctness for the measured values; and

outputting the ground truth dataset, wherein the ground truth dataset is used to determine data correctness of measured values included in at least one new dataset by

receiving the new dataset from a data source, the new dataset comprising a plurality of data elements, wherein each data element of the new dataset includes an identifier and at least one measured value associated with the identifier; and

using respective data correctness values in the ground truth dataset for the measured values of respective data elements of the ground truth dataset determined as being with known identifiers, in determining data correctness values for measured values of data elements of the new dataset, wherein a known identifier is an identifier which is included in the ground truth dataset and in the new dataset, wherein determining data elements of the ground truth dataset with known identifiers comprising comparing identifiers included in the new dataset with identifiers included in the ground truth dataset, wherein data correctness values in the ground truth dataset are assigned for measured values of data elements of the new dataset based on overlapping of the measured values of data elements of the new dataset with measured values of data elements of the ground truth dataset with known identifiers.