US 12,436,865 B2
Natural language processing engine for automated detection of source code discrepancies
Marcus Raphael Matos, Richardson, TX (US); Jack Lawson Bishop, III, Evanston, IL (US); Robert Cain Durbin, Jr., New Hope, PA (US); Daniel Joseph Serna, The Colony, TX (US); Benjamin Tweel, Romeoville, IL (US); and Jake Michael Yara, Mint Hill, NC (US)
Filed by BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed on Jan. 3, 2023, as Appl. No. 18/092,519.
Prior Publication US 2024/0220393 A1, Jul. 4, 2024
Int. Cl. G06F 11/3604 (2025.01); G06F 8/65 (2018.01); G06F 40/166 (2020.01); G06F 40/194 (2020.01); G06F 40/40 (2020.01); H04L 41/0686 (2022.01)
CPC G06F 11/3608 (2013.01) [G06F 8/65 (2013.01); G06F 40/166 (2020.01); G06F 40/194 (2020.01); G06F 40/40 (2020.01); H04L 41/0686 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A system for automated detection of source code discrepancies, the system comprising:
at least one non-transitory storage device; and
at least one processor coupled to the at least one non-transitory storage device, wherein the at least one processor is configured to:
train a machine learning engine to output a plurality of identified updates associated with a source code file, wherein training the machine learning engine comprises:
receiving, via a data acquisition engine, a training dataset comprising software code, wherein the data acquisition engine is configured to identify a location of the training dataset and identify one or more connection characteristics associated with access of the training dataset;
converting, via a data pre-processing engine, the training dataset from a non-standardized format to a standardized format;
processing, via the data pre-processing engine, the converted training dataset to generate an extracted feature set using a dimensionality reduction technique;
executing, using a machine learning model tuning engine, a plurality of testing cycles using the extracted feature set, wherein the machine learning model tuning engine is configured to vary one or more testing parameters for each testing cycle of the plurality of testing cycles; and
deploying a trained machine learning engine into a production environment;
receive a data transmission comprising a text file and the source code file, wherein the text file and the source code file are associated with a software update;
process the source code file via the trained machine learning engine, wherein an output of the trained machine learning engine comprises the plurality of identified updates and a first impact score associated with the plurality of identified updates;
process the text file via a natural language processing engine, wherein an output of the natural language processing engine comprises a plurality of expected updates and a second impact score associated with the plurality of expected updates;
identify a difference between the first impact score and the second impact score; and
based on the identified difference, perform a remedial action, wherein the remedial action comprises preventing a network device from installing the software update.