US 12,072,957 B2
Data classification system, data classification method, and recording medium
Genki Kusano, Tokyo (JP); and Tomoya Sakai, Tokyo (JP)
Assigned to NEC CORPORATION, Tokyo (JP)
Appl. No. 17/925,880
Filed by NEC Corporation, Tokyo (JP)
PCT Filed May 28, 2020, PCT No. PCT/JP2020/021055
§ 371(c)(1), (2) Date Nov. 17, 2022,
PCT Pub. No. WO2021/240707, PCT Pub. Date Dec. 2, 2021.
Prior Publication US 2023/0195851 A1, Jun. 22, 2023
Int. Cl. G06F 16/24 (2019.01); G06F 18/2431 (2023.01)
CPC G06F 18/2431 (2023.01) 3 Claims
OG exemplary drawing
 
1. A data classification system comprising:
a memory configured to store instructions; and
a processor configured to execute the instructions to:
calculate, for each of a plurality of known classes that appear in training data on which a machine learning model is trained, a known class likelihood that target data belongs to the known class among all the known classes, using the machine learning model;
calculate, for each of a plurality of unknown classes that do not appear in the training data on which the machine learning model is trained, a similarity of each known class to the unknown class;
calculate, for each unknown class, an unknown class likelihood that the target data belongs to the unknown class among all the unknown classes, based on the known class likelihood and the similarity of each known class to the unknown class;
select, as a plurality of candidate classes, one or more of the known classes and one or more of the unknown classes based on the known class likelihood of each known class and the unknown class likelihood of each unknown class,
wherein the selected known classes are a first number of the known classes for which the known class likelihood is highest, or are the known classes for which the known class likelihood is greater than a first threshold, and
wherein the selected unknown classes are a second number of the unknown classes for which the unknown class likelihood is highest, or are the unknown classes for which the unknown class likelihood is greater than a second threshold;
calculate, for each candidate class, an all-class likelihood that the target data belongs to the candidate class among all the candidate classes; and
estimate the candidate class to which the target data belongs based on the all-class likelihood of each candidate class.