US 12,073,195 B2
Retrieval-augmented code completion
Nan Duan, Beijing (CN); Shuai Lu, Beijing (CN); Neelakantan Sundaresan, Bellevue, WA (US); and Alexey Svyatkovskiy, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC., Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC., Redmond, WA (US)
Filed on May 9, 2022, as Appl. No. 17/740,042.
Prior Publication US 2023/0359441 A1, Nov. 9, 2023
Int. Cl. G06F 8/33 (2018.01); G06F 40/30 (2020.01); G06N 3/045 (2023.01)
CPC G06F 8/33 (2013.01) [G06F 40/30 (2020.01); G06N 3/045 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a processor; and
a memory that stores a program that is configured to be executed by the processor, the program comprising instructions to perform actions that:
obtain a partially-formed source code snippet in a source code program to complete;
generate an embedding of the partially-formed source code snippet and a sparse vector of the partially-formed source code snippet;
search for a semantically-similar source code snippet to the partially-formed source code snippet in a retrieval source code database, wherein the retrieval source code database comprises a plurality of source code segments arranged in a consecutive order, wherein each source code segment in the retrieval source code database is accessed by a respective embedding and a respective sparse vector, wherein the search is based on matching the embedding of the partially-formed source code snippet with the embeddings of each source code segment and matching the sparse vector of the partially-formed source code snippet with the sparse vector of each source code segment;
select the semantically-similar source code snippet from the retrieval source code database having a closest match to the embedding of the partially-formed source code snippet and to the sparse vector of the partially-formed source code snippet;
obtain a source code segment from the retrieval source code database immediately following the selected semantically-similar source code; and
predict a candidate to complete the partially-formed source code snippet from a deep learning model given the partially-formed source code snippet and the source code segment that immediately follows the selected semantically-similar source code snippet in the retrieval source code database.