US 12,468,980 B2
Complexity based artificial intelligence model training
Sahil Suneja, Ossining, NY (US); Yufan Zhuang, New York, NY (US); Yunhui Zheng, Chappaqua, NY (US); Alessandro Morari, New York, NY (US); and Jim Alain Laredo, Katonah, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Sep. 30, 2021, as Appl. No. 17/491,275.
Prior Publication US 2023/0115723 A1, Apr. 13, 2023
Int. Cl. G06N 20/00 (2019.01); G06F 18/24 (2023.01); G06N 20/20 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 18/24 (2023.01); G06N 20/20 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a memory that stores computer executable components; and
a processor, operably coupled to the memory, and that executes at least one of the computer executable components that:
selects, from a group of complexity code metrics, a defined code complexity metric that when employed for ranking training samples to train neural network models for source code understanding tasks reduces false positive errors and false negative errors in performing the source code understanding tasks by the neural network models as compared to unranked training samples for training the neural network models;
determines respective code complexity values of training source code samples based on the defined code complexity metric, wherein the respective code complexity value of a training source code sample represents a measure of complexity of source code of the training source code sample;
ranks the training source code samples based on the respective code complexity values; and
iteratively trains, using the training source code samples and beginning with a least complex training source code sample in the ranking according to the respective code complexity values, a neural network model to perform a source code understanding task associated with understanding a defined task-relevant aspect of source code samples, wherein each subsequent iteration employs a next least complex training source code sample in the ranking according to the respective code complexity values to mitigate the false positive errors and the false negative errors in the source code understanding task by the neural network model.