CPC G06F 8/33 (2013.01) [G06F 18/217 (2023.01); G06N 3/04 (2013.01)] | 20 Claims |
1. A system comprising:
a processor; and
a memory that stores a program configured to be executed by the processor, the program comprising instructions to perform actions that:
access a first deep learning model previously trained to generate source code for a first source code task, wherein the first deep learning model comprises parameters learned through cross-entropy loss;
tune the parameters of the first deep learning model to train a second deep learning model to learn to generate source code for a second source code task, wherein tune the parameters of the first deep learning model to train the second deep learning model comprises instructions to perform actions that:
input a training sample to the first deep learning model and to the second deep learning model, wherein the first deep learning model predicts a first predicted source code snippet over T timesteps, wherein the second deep learning model predicts a second predicted source code snippet over T timesteps;
compute a code-quality reward for the second predicted source code snippet, wherein the code-quality reward is based on syntax correctness of the second predicted source code snippet, successful execution of the second predicted source code snippet, successful compilation of the second predicted source code snippet, and successful invocation of the second predicted source code snippet;
compute a reward for the second predicted source code snippet at each timestep t based on a divergence between an output distribution from the first deep learning model at each time step t and an output distribution from the second deep learning model at each time step t;
add the code-quality reward to the reward of the last timestep;
compute a policy loss based on the rewards of each timestep t; and
backpropagate the policy loss to the second deep learning model to adjust the parameters of the second deep learning model; and
deploy the second deep learning model in an inference system to generate source code for the second source code task.
|