| CPC G06F 40/30 (2020.01) [G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)] | 36 Claims |

|
1. A method implemented by a data processing system for discovering a semantic meaning of data values of a field included in one or more data sets, the method including:
identifying a field included in one or more data sets, the field being associated with an identifier;
identifying a plurality of data values of the field, with at least a first one of the data values being distinct from a second one of the data values;
based on the plurality of data values of the field, determining one or more attributes of the data values of the field;
accessing a plurality of tests, wherein a test specifies a candidate semantic meaning for one or more given attributes;
wherein a semantic meaning indicates what kind of data values are included in a given field;
based on results of applying at least the plurality of tests to the determined one or more attributes for the field, specifying one or more candidate semantic meanings of the one or more data values of the field;
performing an analysis based on at least a candidate semantic meaning specified by a result of one of the tests and another candidate semantic meaning specified by a result of another one of the tests;
based on one or more results of the analysis, identifying one of the candidate semantic meanings as identifying the semantic meaning of the one or more data values of the field; and
storing, in a data store, the identifier of the field with the identified semantic meaning of the one or more data values of the field.
|