US 11,704,494 B2
Discovering a semantic meaning of data fields from profile data of the data fields
Christopher Thurston Butler, Ascot (GB); and Timothy Spencer Bush, North Leigh (GB)
Assigned to Ab Initio Technology LLC, Lexington, MA (US)
Filed by Ab Initio Technology LLC, Lexington, MA (US)
Filed on Feb. 19, 2020, as Appl. No. 16/794,361.
Claims priority of provisional application 62/855,233, filed on May 31, 2019.
Prior Publication US 2020/0380212 A1, Dec. 3, 2020
Int. Cl. G06F 40/30 (2020.01); G06F 16/93 (2019.01); G06F 16/908 (2019.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)] 30 Claims
OG exemplary drawing
 
1. A method implemented by a data processing system for discovering a semantic meaning of data of a field included in one or more data sets, the method including:
identifying a field included in one or more data sets, with the field associated with an identifier; and
for that field:
profiling, by a data processing system, one or more data values of the field to generate a data profile for the field, with the data profile specifying one or more attributes of the one or more data values of the field;
accessing a plurality of tests, wherein a test specifies one or more given attributes and a label providing information about the one or more given attributes;
based on applying at least the plurality of tests to the data profile for the field, generating one or more label proposals for the field, wherein a label proposal includes a label that is proposed as providing a semantic meaning for the one or more data values of the field;
wherein a semantic meaning indicates what kind of data values are included in a given field;
determining a similarity among the one or more label proposals;
based at least on the similarity among the one or more label proposals, selecting a classification that specifies whether input is required in identifying the semantic meaning, from among the one or more label proposals, of the one or more data values of the field;
based on the classification, rendering a graphical user interface that requests input in identifying the semantic meaning for the one or more data values of the field or determining that no input is required;
identifying one of the label proposals as identifying the semantic meaning of the one or more data values of the field; and
storing, in a data store, the identifier of the field with the identified one of the one or more label proposals that identifies the semantic meaning of the one or more data values of the field.