US 11,886,399 B2
Generating rules for data processing values of data fields from semantic labels of the data fields
John Joyce, Newton, MA (US); Marshall A. Isman, Newton, MA (US); and Sandrick Melbouci, Myersville, MD (US)
Assigned to Ab Initio Technology LLC, Lexington, MA (US)
Filed by Ab Initio Technology LLC, Lexington, MA (US)
Filed on Aug. 28, 2020, as Appl. No. 17/006,504.
Claims priority of provisional application 62/981,646, filed on Feb. 26, 2020.
Prior Publication US 2021/0263900 A1, Aug. 26, 2021
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/28 (2019.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06F 16/22 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2228 (2019.01); G06F 16/285 (2019.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 40 Claims
OG exemplary drawing
 
1. A method for determining a data quality rule for values in a field of a data record in a set of data records based on a label associated with the field, the label indicating characteristics of the values of the field, the method being implemented by a data processing system and including:
retrieving a label index that associates a label with a set of one or more fields in a data record, wherein the label identifies a type of information expected in each field of the set of one or more fields;
accessing a data dictionary that associates the type of information indicated by the label with a set of attribute values representing requirements for values of the one or more fields associated with the label, the requirements including logical or syntactical characteristics of the values for the one or more fields; and
for a field of a particular data record:
identifying, by accessing the label index, a particular label associated with the field of the particular data record;
retrieving, from the data dictionary, an attribute value for the particular label, the attribute value specifying a particular requirement for the field; and
generating a data quality rule that, when executed, is configured to:
validate whether a value of the field satisfies the particular requirement represented by the attribute value, and
generate output data indicating whether the particular requirement is satisfied.