US 11,868,731 B2
Code retrieval based on multi-class classification
Mehdi Bahrami, San Jose, CA (US); and Wei-Peng Chen, Fremont, CA (US)
Assigned to FUJITSU LIMITED, Kawasaki (JP)
Filed by FUJITSU LIMITED, Kanagawa (JP)
Filed on Mar. 31, 2022, as Appl. No. 17/657,614.
Claims priority of provisional application 63/261,602, filed on Sep. 24, 2021.
Prior Publication US 2023/0100208 A1, Mar. 30, 2023
Int. Cl. G06F 40/30 (2020.01); G06F 8/41 (2018.01); G06F 40/216 (2020.01); G06F 18/22 (2023.01); G06F 18/20 (2023.01); G06F 18/23213 (2023.01); G06N 3/04 (2023.01); G06F 8/73 (2018.01); G06F 40/166 (2020.01); G06F 40/40 (2020.01); G06F 40/211 (2020.01); G06F 40/242 (2020.01); G06F 8/30 (2018.01); G06N 3/08 (2023.01); G06F 40/44 (2020.01); G06F 8/36 (2018.01); G06F 8/65 (2018.01); G06F 11/36 (2006.01); G06F 16/951 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 8/30 (2013.01); G06F 8/36 (2013.01); G06F 8/42 (2013.01); G06F 8/436 (2013.01); G06F 8/65 (2013.01); G06F 8/73 (2013.01); G06F 11/3624 (2013.01); G06F 16/951 (2019.01); G06F 18/22 (2023.01); G06F 18/23213 (2023.01); G06F 18/285 (2023.01); G06F 40/166 (2020.01); G06F 40/211 (2020.01); G06F 40/216 (2020.01); G06F 40/242 (2020.01); G06F 40/40 (2020.01); G06F 40/44 (2020.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, executed by a processor, comprising:
receiving a set of natural language (NL) descriptors and a corresponding set of programming language (PL) codes;
determining a first vector associated with each of the received set of NL descriptors, based on a first machine-learning language model;
determining a second vector associated with each of the received set of PL codes, based on a second machine-learning language model, the second machine-learning language model is different from the first machine-learning language model;
determining, using a statistical model, a number of a set of semantic code classes to cluster the set of PL codes;
clustering the set of PL codes into the set of semantic code classes, based on the determined number, the determined first vector, and the determined second vector;
training a multi-class classifier model configured to predict a semantic code class, from the set of semantic code classes, corresponding to an input NL descriptor, wherein
the predicted semantic code class is associated with a PL code corresponding to the input NL descriptor, and
the multi-class classifier model is trained based on the set of NL descriptors, the set of PL codes corresponding to the set of NL descriptors, and the set of semantic code classes in which the set of PL codes are clustered;
selecting an intra-class predictor model from a set of intra-class predictor models, based on the predicted semantic code class; and
training the selected intra-class predictor model based on the input NL descriptor, the selected intra-class predictor model is configured to predict the PL code corresponding to the input NL descriptor for source code re-use or retrieval.