CPC G06F 16/2272 (2019.01) [G06F 16/221 (2019.01); G06F 16/2365 (2019.01)] | 20 Claims |
1. A computer-implemented method for automatically tagging data, the method comprising:
receiving a first set of data;
automatically generating, based on a part of the first set of data, a statistical summary of the first set of data, wherein the statistical summary includes:
a data-tagging pattern representing a value of the part of the first set of data, and
a first degree of generalizing data pattern for the data-tagging pattern;
interactively receiving, based on a user selection through a graphical user interface, a second set of data, wherein the second set of data is distinct from the first set of data;
generating, based on the data-tagging pattern in the statistical summary and data in the second set of data, a candidate data-tagging pattern;
selecting, based on a combination of a false negative rate associated with the candidate data-tagging pattern upon the data in the second set of data and a second degree of generalizing data pattern associated with the candidate data-tagging pattern upon the first set of data, the candidate data-tagging pattern as a data-tagging pattern;
automatically tagging based on the selected data-tagging pattern for creating new tagging data, a data tag to a subset of the first set of data; and
presenting, through the graphical user interface, based on the automatically tagged data tag, at least part of the automatically tagged subset of the first set of data for storing as output of processing a search query.
|