US 12,141,319 B2
	Systems and methods for dataset quality quantification in a zero-trust computing environment
Mary Elizabeth Chalk, Austin, TX (US); and Robert Derward Rogers, Oakland, CA (US)
Assigned to BeeKeeperAI, Inc., Austin, TX (US)
Filed by BeeKeeperAI, Inc., Austin, TX (US)
Filed on Feb. 14, 2023, as Appl. No. 18/169,122.
Application 18/169,122 is a continuation of application No. 18/168,560, filed on Feb. 13, 2023.
Claims priority of provisional application 63/313,774, filed on Feb. 25, 2022.
Prior Publication US 2023/0274025 A1, Aug. 31, 2023
Int. Cl. G06F 16/22 (2019.01); G06F 16/2458 (2019.01); G06F 21/60 (2013.01); G06F 21/62 (2013.01); G16H 50/70 (2018.01)

CPC G06F 21/6245 (2013.01) [G06F 16/2237 (2019.01); G06F 16/2458 (2019.01); G06F 16/2462 (2019.01); G06F 21/602 (2013.01); G16H 50/70 (2018.01); G06F 2221/2115 (2013.01)]

12 Claims

1. A computerized method for dataset quality quantification in a zero-trust environment, the method comprising:

receiving a sample dataset from a data steward;

applying a rule based screening of the sample dataset to generate a heuristic quality score, wherein the heuristic quality score quantifies erroneous data within the sample dataset;

generating a sample vector set from the sample dataset;

receiving an example vector set generated from at least one of a synthetic dataset and an amalgamation of different vector sets from a plurality of different data stewards;

calculating a difference between the sample vector set and the example vector set to generate a degree of difference quality score; and

combining the heuristic quality score and the degree of difference quality score into a quality metric;

determining that the quality metric is above a threshold, and

when the quality metric is above the threshold then recommending the sample dataset to a data consumer.