US 11,941,373 B2
	Code generation through reinforcement learning using code-quality rewards
Shao Kun Deng, New York City, NY (US); Neelakantan Sundaresan, Bellevue, WA (US); Alexey Svyatkovskiy, Bellevue, WA (US); and Michele Tufano, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC., Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC., Redmond, WA (US)
Filed on Dec. 17, 2021, as Appl. No. 17/555,263.
Prior Publication US 2023/0195428 A1, Jun. 22, 2023
Int. Cl. G06F 8/33 (2018.01); G06F 18/21 (2023.01); G06N 3/04 (2023.01)

CPC G06F 8/33 (2013.01) [G06F 18/217 (2023.01); G06N 3/04 (2013.01)]

20 Claims

1. A system comprising:

one or more processors; and

a memory that stores one or more programs that are configured to be executed by the one or more processors, the one or more programs include instructions to perform actions that:

obtain a deep learning model previously trained to generate source code for a first source code task, wherein parameters of the deep learning model are learned through cross-entropy loss;

tune the parameters of the deep learning model to learn to generate source code for a second source code task, wherein the parameters of the deep learning model are tuned through reinforcement learning using a reward, wherein the reward for the source code generated by the deep learning model comprises a code-quality reward score based on a code quality factor and a source code metric of the generated source code; and

deploy the tuned deep learning model in an inference system to generate source code for the second source code task.