US 12,229,277 B2
Source code clustering for automatically identifying false positives generated through static application security testing
Jack Lawson Bishop, III, Evanston, IL (US); Anthony Herron, Upper Marlboro, MD (US); Yao Houkpati, Woodbridge, VA (US); and Carrie E. Gates, Livermore, CA (US)
Assigned to Bank of America Corporation, Charlotte, NC (US)
Filed by BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed on Jan. 10, 2024, as Appl. No. 18/409,538.
Application 18/409,538 is a continuation of application No. 17/536,916, filed on Nov. 29, 2021, granted, now 11,928,221.
Prior Publication US 2024/0143786 A1, May 2, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/57 (2013.01); G06F 8/41 (2018.01); G06F 8/75 (2018.01); G06F 21/56 (2013.01); H04L 9/40 (2022.01)
CPC G06F 21/577 (2013.01) [G06F 8/427 (2013.01); G06F 8/75 (2013.01); G06F 21/563 (2013.01); H04L 63/1433 (2013.01); G06F 2221/033 (2013.01); G06F 2221/034 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a database configured to store a plurality of source code segments comprising a first source code segment and a second source code segment;
a memory configured to store:
a first plurality of vulnerability findings for the first source code segment, wherein:
a first vulnerability finding of the first plurality of vulnerability findings has been classified as a real vulnerability by an external review; and
a second vulnerability finding of the first plurality of vulnerability findings has been classified as a false positive by the external review; and
a second plurality of vulnerability findings for the second source code segment; and
a hardware processor communicatively coupled to the memory and to the database, the hardware processor configured to:
generate a plurality of source code fingerprints, each source code fingerprint of the plurality of source code fingerprints corresponding to a source code segment of the plurality of source code segments, wherein generating the source code fingerprint comprises:
generating, from the corresponding source code segment, an abstract syntax tree;
performing a data flow analysis on the corresponding source code segment, to generate information identifying flows of data through the corresponding source code segment;
augmenting the abstract syntax tree associated with the source code segment with the information identifying the flows of data through the source code segment; and
flattening the augmented abstract syntax tree associated with the source code segment;
determine that the source code fingerprint corresponding to the first source code segment matches the source code fingerprint corresponding to the second source code segment; and
in response to determining that the source code fingerprint corresponding to the first source code segment matches the source code fingerprint corresponding to the second source code segment:
automatically classify a first vulnerability finding of the second plurality of vulnerability findings as the real vulnerability, in response to determining that the first vulnerability finding of the second plurality of vulnerability findings matches the first vulnerability finding of the first plurality of vulnerability findings; and
automatically classify a second vulnerability finding of the second plurality of vulnerability findings as the false positive, in response to determining that the second vulnerability finding of the second plurality of vulnerability findings matches the second vulnerability finding of the first plurality of vulnerability findings.