| CPC G06F 21/6245 (2013.01) [G06F 16/2237 (2019.01); G06F 16/2458 (2019.01); G06F 16/2462 (2019.01); G06F 21/602 (2013.01); G16H 50/70 (2018.01); G06F 2221/2115 (2013.01)] | 12 Claims |

|
1. A computerized method for dataset quality quantification in a zero-trust environment, the method comprising:
receiving a sample dataset from a data steward;
applying a rule based screening of the sample dataset to generate a heuristic quality score, wherein the heuristic quality score quantifies erroneous data within the sample dataset;
generating a sample vector set from the sample dataset;
receiving an example vector set generated from at least one of a synthetic dataset and an amalgamation of different vector sets from a plurality of different data stewards;
calculating a difference between the sample vector set and the example vector set to generate a degree of difference quality score; and
combining the heuristic quality score and the degree of difference quality score into a quality metric;
determining that the quality metric is above a threshold, and
when the quality metric is above the threshold then recommending the sample dataset to a data consumer.
|