US 12,008,365 B2
Systems and method for automated code analysis and tagging
Sean Moran, London (GB); Sanat Saha, Mumbai (IN); Gaurav Singh, Mumbai (IN); Fanny Silavong, London (GB); Antonios Georgiadis, London (GB); Ganesh Chandrasekar, London (GB); Andy Alexander, St Albans (GB); Rob Otter, Witham (GB); and Brett Sanford, Tunbridge Wells (GB)
Assigned to JPMORGAN CHASE BANK , N.A., New York, NY (US)
Filed by JPMORGAN CHASE BANK , N.A., New York, NY (US)
Filed on Feb. 14, 2022, as Appl. No. 17/651,005.
Claims priority of application No. 20210100433 (GR), filed on Jun. 28, 2021.
Prior Publication US 2023/0259359 A1, Aug. 17, 2023
Int. Cl. G06F 8/75 (2018.01); G06F 8/41 (2018.01); G06F 17/16 (2006.01)
CPC G06F 8/75 (2013.01) [G06F 8/427 (2013.01); G06F 17/16 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for automated code analysis and tagging, comprising:
receiving, by a code annotation computer program executed by a computer processor, a training code snippet from a training codebase;
parsing, by the code annotation computer program, the training code snippet into a data structure;
quantifying, by the code annotation computer program, the data structure,
parsing, by the code annotation computer program, a docstring associated with the training code snippet into a plurality of keywords;
quantifying, by the code annotation computer program, the plurality of keywords;
training, by the code annotation computer program, a code annotation model based on a similarity between the quantified data structure and a smoothing parameter for a Dirichlet prior smoothing estimate; and
generating an annotation for the training code snippet by matrix multiplication of a test code-training probability matrix and a training code-word probability matrix, the test code-training code probability matrix computed from a Gaussian kernel density estimate between each test code snippet and each training code snippet from the training codebase.