CPC G06F 8/35 (2013.01) [G06F 11/3612 (2013.01)] | 30 Claims |
1. A computer-program product comprising a non-transitory machine-readable storage medium storing computer instructions that, when executed by one or more processors, perform operations comprising:
identifying a plurality of code synthesis items for a target programming language;
generating a code synthesis prompt based on a first sampling of the plurality of code synthesis items;
synthesizing, via a large language model, a plurality of raw code segments using the code synthesis prompt;
executing the plurality of raw code segments with a code interpreter associated with the target programming language;
determining one or more valid code segments of the plurality of raw code segments that the code interpreter successfully executed;
aggregating, via a second sampling, the one or more valid code segments into one or more validated code synthesis training samples, wherein a respective validated code synthesis training sample of the one or more validated code synthesis training samples at least includes:
a natural language description of a target coding task, and
one or more code segments that implement the target coding task; and
training a code generation model using the one or more validated code synthesis training samples, wherein:
the training the code generation model includes using supervised learning to train the code generation model;
the supervised learning causes the code generation model to learn to map the natural language description of the target coding task to the one or more code segments that implement the target coding task.
|