US 11,886,468 B2
Fingerprint-based data classification
Xu Bin Cai, Beijing (CN); Xiaobo Wang, Beijing (CN); Chun Hua Sun, Beijing (CN); Yi Wang, Beijing (CN); and Wei Wang, Beijing (CN)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Dec. 3, 2021, as Appl. No. 17/541,704.
Prior Publication US 2023/0177071 A1, Jun. 8, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 16/28 (2019.01); G06N 20/00 (2019.01); G06F 16/22 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/221 (2019.01); G06F 16/2264 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
generating, by a computing device based on predetermined rules, a first fingerprint of a first data column in a data set to be classified, the fingerprint comprising column dimensions, wherein each of the column dimensions is assigned an attribute representing a characteristic of data in the data column;
determining, by the computing device, that the first fingerprint matches one or more target fingerprints by comparing the first fingerprint to the target fingerprints, wherein each target fingerprint is associated with a class and includes class dimensions, each class dimension is assigned an attribute representing a characteristic of data in the class, and each of the one or more target fingerprints further include a weight assigned to each of the class dimensions based on a frequency distribution of each of the class dimensions; and
assigning, by the computing device, one or more classes to the data column based on the one or more target fingerprints, thereby generating classified data.