US 12,443,510 B2
Localizing vulnerabilities in source code at a token-level
Aaron Yue-Chiu Chan, Provo, UT (US); Anant Girish Kharkar, Huntersville, NC (US); Yevhen Mohylevskyy, Redmond, WA (US); Kalpathy Sitaraman Sivaraman, Bothell, WA (US); Neelakantan Sundaresan, Bellevue, WA (US); and Roshanak Zilouchian Moghaddam, Kirkland, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC., Redmond, WA (US)
Filed on Jun. 12, 2023, as Appl. No. 18/208,619.
Prior Publication US 2024/0411666 A1, Dec. 12, 2024
Int. Cl. G06F 11/362 (2025.01)
CPC G06F 11/3624 (2013.01) 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
detecting a software vulnerability in a source code snippet and positions of tokens in the source code snippet attributable to the software vulnerability, wherein the detected software vulnerability is associated with a type, wherein the source code snippet includes a fixed-length portion of a source code program;
obtaining a plurality of few-shot examples comprising a first few-shot example and a second few-shot example, wherein the first few-shot example of the plurality of few-shot examples comprises a first source code snippet having the type of the detected software vulnerability, wherein the second few-shot example of the plurality of few-shot examples includes a second source code snippet without the detected software vulnerability;
creating an input to a large language model for the large language model to determine whether the source code snippet contains the detected software vulnerability, wherein the input includes the plurality of few-shot examples, the detected software vulnerability, the positions of the tokens in the source code snippet attributable to the detected software vulnerability, and the source code snippet;
obtaining a response from the large language model, given the input, wherein the response indicates whether the source code snippet contains the detected software vulnerability; and
upon the large language model identifying the detected software vulnerability in the source code snippet, generating repair code to remedy the detected software vulnerability in the source code snippet.