US 11,954,485 B1
Classification of programming language code into basic constructs of source code and non-source code
Mayur Kadu, Burlington, MA (US); Harshad Sathe, Burlington, MA (US); Saheed Olanigan, Burlington, MA (US); and Jagat Parekh, Burlington, MA (US)
Assigned to Synopsys, Inc., Sunnyvale, CA (US)
Filed by Synopsys, Inc., Mountain View, CA (US)
Filed on Jun. 23, 2021, as Appl. No. 17/356,269.
Claims priority of provisional application 63/043,957, filed on Jun. 25, 2020.
Int. Cl. G06F 9/44 (2018.01); G06F 8/73 (2018.01); G06F 21/10 (2013.01); G06N 3/08 (2023.01); G06V 30/262 (2022.01)
CPC G06F 8/73 (2013.01) [G06F 21/10 (2013.01); G06N 3/08 (2013.01); G06V 30/274 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method for processing a source code file, the method comprising:
scanning the source code file to identify text lines;
analyzing, via one or more processors, the text lines with a classifier to identify one or more of the text lines that correspond to license information, and wherein the classifier is trained with sample source code files;
generating a subset of the text lines, wherein the subset excludes the one or more of the text lines identified as corresponding to the license information;
determining whether text lines within the subset are open source code by comparing the subset to a database, wherein the database includes a plurality of text lines associated with open source code; and
outputting first text lines of the text lines that are the open source code with at least one or more of a security risk and a compliance risk for the source code file.