CPC G06F 8/33 (2013.01) [G06F 18/217 (2023.01); G06N 3/04 (2013.01)] | 20 Claims |
1. A system comprising:
one or more processors; and
a memory that stores one or more programs that are configured to be executed by the one or more processors, the one or more programs include instructions to perform actions that:
obtain a deep learning model previously trained to generate source code for a first source code task, wherein parameters of the deep learning model are learned through cross-entropy loss;
tune the parameters of the deep learning model to learn to generate source code for a second source code task, wherein the parameters of the deep learning model are tuned through reinforcement learning using a reward, wherein the reward for the source code generated by the deep learning model comprises a code-quality reward score based on a code quality factor and a source code metric of the generated source code; and
deploy the tuned deep learning model in an inference system to generate source code for the second source code task.
|