US 12,242,822 B2
	Custom models for source code generation via prefix-tuning
Colin Bruce Clement, Seattle, WA (US); Neelakantan Sundaresan, Bellevue, WA (US); Alexey Svyatkovskiy, Bellevue, WA (US); Michele Tufano, Bellevue, WA (US); and Andrei Zlotchevski, Brossard (CA)
Assigned to Microsoft Technology Licensing, LLC., Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC., Redmond, WA (US)
Filed on Mar. 13, 2024, as Appl. No. 18/603,214.
Application 18/603,214 is a continuation of application No. 17/535,391, filed on Nov. 24, 2021, granted, now 11,947,935.
Prior Publication US 2024/0220215 A1, Jul. 4, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/44 (2018.01); G06F 8/35 (2018.01); G06N 3/084 (2023.01); G06N 3/048 (2023.01)

CPC G06F 8/35 (2013.01) [G06N 3/084 (2013.01); G06N 3/048 (2023.01)]

20 Claims

1. A system comprising:

a processor; and

a memory that stores a program configured to be executed by the processor, the program comprises instructions that when executed perform acts that:

obtain, in a local execution environment, a pre-trained deep learning model trained to perform a first source code generation task, wherein the pre-trained deep learning model comprises a plurality of transformer blocks, wherein the plurality of transformer blocks comprises a plurality of model parameters;

receive, from a remote execution environment, tuning data to tune the pre-trained deep learning model to perform a second source code generation task, wherein the tuning data comprises a plurality of trainable parameters and source code samples, wherein the plurality of trainable parameters is separate from the plurality of model parameters;

tune, in the local execution environment, the pre-trained deep learning model to learn to perform the second source code generation task through application of the tuning data to the plurality of transformer blocks, wherein the plurality of model parameters is kept frozen and the plurality of trainable parameters are updated in the plurality of transformer blocks, wherein the local execution environment differs from the remote execution environment;

output hidden states from a last transformer block of the plurality of transformer blocks to the remote execution environment;

receive from the remote execution environment an error loss obtained from the outputted hidden states; and

update the plurality of trainable parameters in each of the plurality of transformer blocks based on the error loss.