US 11,675,976 B2
Exploitation of domain restrictions for data classification
Sigal Asaf, Zichron Yaakov (IL); Ariel Farkash, Shinshit (IL); Lev Greenberg, Haifa (IL); and Micha Gideon Moffie, Zichron Yaakov (IL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jul. 7, 2019, as Appl. No. 16/504,274.
Prior Publication US 2021/0004637 A1, Jan. 7, 2021
Int. Cl. G06F 40/279 (2020.01); G06N 5/02 (2023.01); G06F 18/22 (2023.01); G06F 18/2415 (2023.01)
CPC G06F 40/279 (2020.01) [G06F 18/22 (2023.01); G06F 18/2415 (2023.01); G06N 5/02 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented in a computer system comprising a processor, memory accessible by the processor, computer program instructions stored in the memory and executable by the processor, and data stored in the memory and accessible by the processor, the method comprising:
obtaining, at the computer system, data including a plurality of data strings of a plurality of categories, the data strings in each category having a same string pattern;
determining a loose string format and a set of restrictions based on at least one string pattern;
classifying the plurality of data strings to respective categories based on the loose string format of the data strings and on the restrictions on the data strings of the categories by determining a category score indicating a match of a data string that matches the loose string format and meets the restrictions, wherein the classifying utilizes restriction information of non-selected categories when determining the matching of a selected category; and
decreasing the category score of the selected category if a mean restriction matching proportion for the selected category is less than a threshold amount above an expected mean restriction matching proportion.