US 12,456,016 B2
Discovering a semantic meaning of data fields from profile data of the data fields
Christopher Thurston Butler, Ascot (GB); and Timothy Spencer Bush, North Leigh (GB)
Assigned to Ab Initio Technology LLC, Lexington, MA (US)
Filed by Ab Initio Technology LLC, Lexington, MA (US)
Filed on May 24, 2023, as Appl. No. 18/201,545.
Application 18/201,545 is a continuation of application No. 16/794,361, filed on Feb. 19, 2020, granted, now 11,704,494.
Claims priority of provisional application 62/855,233, filed on May 31, 2019.
Prior Publication US 2023/0409835 A1, Dec. 21, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/30 (2020.01); G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)] 36 Claims
OG exemplary drawing
 
1. A method implemented by a data processing system for discovering a semantic meaning of data values of a field included in one or more data sets, the method including:
identifying a field included in one or more data sets, the field being associated with an identifier;
identifying a plurality of data values of the field, with at least a first one of the data values being distinct from a second one of the data values;
based on the plurality of data values of the field, determining one or more attributes of the data values of the field;
accessing a plurality of tests, wherein a test specifies a candidate semantic meaning for one or more given attributes;
wherein a semantic meaning indicates what kind of data values are included in a given field;
based on results of applying at least the plurality of tests to the determined one or more attributes for the field, specifying one or more candidate semantic meanings of the one or more data values of the field;
performing an analysis based on at least a candidate semantic meaning specified by a result of one of the tests and another candidate semantic meaning specified by a result of another one of the tests;
based on one or more results of the analysis, identifying one of the candidate semantic meanings as identifying the semantic meaning of the one or more data values of the field; and
storing, in a data store, the identifier of the field with the identified semantic meaning of the one or more data values of the field.