| CPC G06F 21/554 (2013.01) [G06F 18/214 (2023.01); G06F 18/2178 (2023.01); G06F 21/552 (2013.01)] | 20 Claims |

|
1. A method for identifying vulnerabilities in computer program code, said method comprising
forming a training data set using semi-supervised learning (SSL) comprising the sub-steps of
receiving labeled text data from a first database set, wherein the labeled text data comprises input (x) and label (y),
receiving unlabeled text data from a second database set wherein the unlabeled text data comprises the input (x), and wherein the unlabeled text data comprises sets of posts generated by a plurality of users,
tokenizing at least one of the labeled text data an unlabeled text data to form a plurality of tokens;
removing from the plurality of tokens a set of noise tokens to form a filtered set of tokens, a frequency of occurrence of the set of noise tokens causing random noise in the training data;
combining the unlabeled text data and the labeled text data into the training data set, the training data set comprising the filtered set of tokens but excluding the set of noise tokens,
training a model based on the training data set comprising the sub-step of minimizing a loss function (L) of the training set, wherein the loss function comprises parameters (Θ) used in the model,
applying the model on the computer program code such that the vulnerabilities are identified.
|