US 11,941,115 B2
Automatic vulnerability detection based on clustering of applications with similar structures and data flows
Jack Lawson Bishop, III, Evanston, IL (US); Anthony Herron, Upper Marlboro, MD (US); Yao Houkpati, Woodbridge, VA (US); and Carrie E. Gates, Livermore, CA (US)
Assigned to Bank of America Corporation, Charlotte, NC (US)
Filed by BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed on Nov. 29, 2021, as Appl. No. 17/537,042.
Prior Publication US 2023/0169164 A1, Jun. 1, 2023
Int. Cl. G06F 21/55 (2013.01)
CPC G06F 21/554 (2013.01) [G06F 2221/033 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a database configured to store a plurality of source code segments comprising a first source code segment and a second source code segment;
a memory configured to store a vulnerability finding for the first source code segment, the vulnerability finding generated through static application security testing (SAST) of the first source code segment and classified as a real vulnerability by an external review; and
a hardware processor communicatively coupled to the memory and to the database, the hardware processor configured to:
generate a plurality of source code fingerprints, each source code fingerprint of the plurality of source code fingerprints corresponding to a source code segment of the plurality of source code segments, wherein generating the source code fingerprint comprises:
generating, from the corresponding source code segment, an abstract syntax tree;
performing a data flow analysis on the corresponding source code segment, to generate information identifying flows of data through the corresponding source code segment;
augmenting the abstract syntax tree associated with the source code segment with the information identifying the flows of data through the source code segment; and
flattening the augmented abstract syntax tree associated with the source code segment;
apply a machine learning clustering algorithm to the plurality of source code fingerprints to group the plurality of source code fingerprints into a plurality of clusters, each cluster of the plurality of clusters comprising one or more source code fingerprints, each of the one or more source code fingerprints of the cluster sharing one or more features identified by the machine learning clustering algorithm;
determine that both the source code fingerprint corresponding to the first source code segment and the source code fingerprint corresponding to the second source code segment belong to a first cluster of the plurality of clusters; and
in response to determining that both the source code fingerprint corresponding to the first source code segment and the source code fingerprint corresponding to the second source code segment belong to the first cluster of the plurality of clusters, transmit an alert to a device of an administrator, the alert identifying the second source code segment as vulnerable to the real vulnerability.