US 12,265,612 B2
Method for identifying vulnerabilities in computer program code and a system thereof
Anton Duppils, Södra Sandby (SE); Magnus Tullberg, Lund (SE); and Carl Emil Orm Wåreus, Lund (SE)
Assigned to DEBRICKED AB, (SE)
Appl. No. 17/759,019
Filed by debricked AB, Malmö (SE)
PCT Filed Jan. 22, 2021, PCT No. PCT/EP2021/051488
§ 371(c)(1), (2) Date Jul. 18, 2022,
PCT Pub. No. WO2021/148625, PCT Pub. Date Jul. 29, 2021.
Claims priority of application No. 2050062-5 (SE), filed on Jan. 23, 2020.
Prior Publication US 2023/0036159 A1, Feb. 2, 2023
Int. Cl. G06F 21/00 (2013.01); G06F 18/21 (2023.01); G06F 18/214 (2023.01); G06F 21/55 (2013.01)
CPC G06F 21/554 (2013.01) [G06F 18/214 (2023.01); G06F 18/2178 (2023.01); G06F 21/552 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for identifying vulnerabilities in computer program code, said method comprising
forming a training data set using semi-supervised learning (SSL) comprising the sub-steps of
receiving labeled text data from a first database set, wherein the labeled text data comprises input (x) and label (y),
receiving unlabeled text data from a second database set wherein the unlabeled text data comprises the input (x), and wherein the unlabeled text data comprises sets of posts generated by a plurality of users,
tokenizing at least one of the labeled text data an unlabeled text data to form a plurality of tokens;
removing from the plurality of tokens a set of noise tokens to form a filtered set of tokens, a frequency of occurrence of the set of noise tokens causing random noise in the training data;
combining the unlabeled text data and the labeled text data into the training data set, the training data set comprising the filtered set of tokens but excluding the set of noise tokens,
training a model based on the training data set comprising the sub-step of minimizing a loss function (L) of the training set, wherein the loss function comprises parameters (Θ) used in the model,
applying the model on the computer program code such that the vulnerabilities are identified.