CPC G06F 16/285 (2019.01) [G06F 16/221 (2019.01); G06F 16/2264 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A method comprising:
generating, by a computing device based on predetermined rules, a first fingerprint of a first data column in a data set to be classified, the fingerprint comprising column dimensions, wherein each of the column dimensions is assigned an attribute representing a characteristic of data in the data column;
determining, by the computing device, that the first fingerprint matches one or more target fingerprints by comparing the first fingerprint to the target fingerprints, wherein each target fingerprint is associated with a class and includes class dimensions, each class dimension is assigned an attribute representing a characteristic of data in the class, and each of the one or more target fingerprints further include a weight assigned to each of the class dimensions based on a frequency distribution of each of the class dimensions; and
assigning, by the computing device, one or more classes to the data column based on the one or more target fingerprints, thereby generating classified data.
|