US 12,081,570 B2
Classification device with adaptive clustering function and related computer program product
Ming-Chang Chiu, New Taipei (TW); Ming-Wei Wu, New Taipei (TW); Pei-Kan Tsung, New Taipei (TW); Che-Yu Lin, New Taipei (TW); and Cheng-Lin Yang, New Taipei (TW)
Assigned to CyCarrier Technology Co., Ltd., New Taipei (TW)
Filed by CyCarrier Technology Co., Ltd., New Taipei (TW)
Filed on Jul. 18, 2022, as Appl. No. 17/867,066.
Claims priority of provisional application 63/223,619, filed on Jul. 20, 2021.
Claims priority of application No. 111126132 (TW), filed on Jul. 12, 2022.
Prior Publication US 2023/0032070 A1, Feb. 2, 2023
Int. Cl. G06F 11/00 (2006.01); H04L 9/40 (2022.01)
CPC H04L 63/1425 (2013.01) [H04L 63/1416 (2013.01)] 10 Claims
OG exemplary drawing
 
1. A log classification device, configured to adaptively cluster a plurality of activities records collected from a target network system, wherein the plurality of activities records are respectively generated by a plurality of device activity reporting programs stored in a plurality of computing devices in the target network system, according to command lines received by the plurality of computing devices; and
the log classification device comprises:
a communication circuit, configured to receive the plurality of activities records through a network;
a storage circuit, configured to store a data analysis program;
a control circuit, coupling the communication circuit and the storage circuit, and configured to execute the data analysis program to generate a discrete space metric tree according to the plurality of activities records and perform a clustering operation on the discrete space metric tree to generate one or more event clusters associated with one or more suspicious event categories; and
an output device, configured to output the one or more event clusters and allow an information security incident diagnosis system to calculate similar feature information and differential feature information of a plurality of activities records in the one or more event clusters as auxiliary information for diagnosing whether there are intrusions or abnormalities in the target network system,
wherein the discrete space metric tree comprises a plurality of nodes, each node represents an activities record, and every two nodes are connected by an edge with a weighting coefficient;
the control circuit in the log classification device is further configured to perform a hierarchical similarity analysis operation to calculate a hierarchical edit distance between two to-be-analyzed activities records;
the control circuit in the log classification device is further configured to, when the discrete space metric tree is generated, perform a hierarchical similarity analysis operation on two to-be-analyzed tokens corresponding to nodes at both ends of each edge in the discrete space metric tree to generate an HED, and set the HED as a weighting coefficient of the edge; and
the hierarchical similarity analysis operation comprises:
interpreting the two to-be-analyzed activities records into a plurality of first tokens and a plurality of second tokens;
calculating a normalized edit distance (NED) between each first token and each second token, the NED being a numerical value between 0 and 1; and
calculating the HED of the two to-be-analyzed activities records according to the NED between each first token and each second token.