CPC G06F 40/30 (2020.01) [G06F 16/908 (2019.01); G06F 16/93 (2019.01); G06N 20/00 (2019.01)] | 30 Claims |
1. A method implemented by a data processing system for discovering a semantic meaning of data of a field included in one or more data sets, the method including:
identifying a field included in one or more data sets, with the field associated with an identifier; and
for that field:
profiling, by a data processing system, one or more data values of the field to generate a data profile for the field, with the data profile specifying one or more attributes of the one or more data values of the field;
accessing a plurality of tests, wherein a test specifies one or more given attributes and a label providing information about the one or more given attributes;
based on applying at least the plurality of tests to the data profile for the field, generating one or more label proposals for the field, wherein a label proposal includes a label that is proposed as providing a semantic meaning for the one or more data values of the field;
wherein a semantic meaning indicates what kind of data values are included in a given field;
determining a similarity among the one or more label proposals;
based at least on the similarity among the one or more label proposals, selecting a classification that specifies whether input is required in identifying the semantic meaning, from among the one or more label proposals, of the one or more data values of the field;
based on the classification, rendering a graphical user interface that requests input in identifying the semantic meaning for the one or more data values of the field or determining that no input is required;
identifying one of the label proposals as identifying the semantic meaning of the one or more data values of the field; and
storing, in a data store, the identifier of the field with the identified one of the one or more label proposals that identifies the semantic meaning of the one or more data values of the field.
|