US 12,147,537 B2
Automated identification of malware families based on shared evidences
Yu-Siang Chen, Minxiong Township (TW); Ci-Hao Wu, Taipei (TW); Ying-Chen Yu, Taipei (TW); Pao-Chuan Liao, Taipei (TW); and June-Ray Lin, Taipei (TW)
Assigned to Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 12, 2023, as Appl. No. 18/536,736.
Application 18/536,736 is a continuation of application No. 17/489,725, filed on Sep. 29, 2021, granted, now 11,899,791.
Prior Publication US 2024/0176880 A1, May 30, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/00 (2013.01); G06F 21/56 (2013.01); G06N 5/02 (2023.01); G06N 5/04 (2023.01)
CPC G06F 21/561 (2013.01) [G06F 21/568 (2013.01); G06N 5/02 (2013.01); G06N 5/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer implemented method, comprising:
constructing a graph data structure comprising detected tag nodes and malware family nodes, and comprising indirect relationships between detected tags and malware families, wherein each detected tag node has one or more outgoing links (OGLs) to malware family nodes;
building a dictionary data structure comprising detected tag entries linking each detected tag to one or more malware family nodes based on the graph data structure;
identifying significant indirect entities (SIEs) within the detected tag entries of the dictionary data structure;
selecting a first SIE, of a plurality of SIEs, as a root node in a family tree data structure;
recursively connecting other SIEs, of the plurality of SIEs, to the root node in the family tree data structure based on OGLs of the SIEs in the plurality of SIEs; and
generating an identifier of the family tree data structure based on SIE identifiers for the SIEs.