US 12,379,923 B2
	Systems and method for automated code analysis and tagging
Sean Moran, London (GB); Sanat Saha, Mumbai (IN); Gaurav Singh, Mumbai (IN); Fanny Silavong, London (GB); Antonios Georgiadis, London (GB); Ganesh Chandrasekar, London (GB); Andy Alexander, St Albans (GB); Rob Otter, Witham (GB); and Brett Sanford, Tunbridge (GB)
Assigned to JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed by JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed on Apr. 16, 2024, as Appl. No. 18/636,973.
Application 18/636,973 is a continuation of application No. 17/651,005, filed on Feb. 14, 2022, granted, now 12,008,365.
Claims priority of application No. 20210100433 (GR), filed on Jun. 28, 2021.
Prior Publication US 2024/0264828 A1, Aug. 8, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 8/75 (2018.01); G06F 8/41 (2018.01); G06F 17/16 (2006.01)

CPC G06F 8/75 (2013.01) [G06F 8/427 (2013.01); G06F 17/16 (2013.01)]

2 Claims

1. A method for automated code analysis and tagging, comprising:

receiving, by a code annotation computer program executed by a computer processor, a training code snippet from a training codebase;

parsing, by the code annotation computer program, the training code snippet into a data structure;

quantifying, by the code annotation computer program, the data structure,

parsing, by the code annotation computer program, a docstring associated with the training code snippet into a plurality of keywords;

quantifying, by the code annotation computer program, the plurality of keywords;

tuning, by the code annotation computer program, a bandwidth of a Gaussian kernel to compute a similarity between the quantified data structures and a smoothing parameter for a Dirichlet prior smoothing estimate, wherein annotations for the training code snippet is generated by matrix multiplication of a test code-training code probability matrix and a training code-word probability matrix, wherein the test code-training code probability matrix is computed from a Gaussian kernel density estimate between each test code snippet and each training code snippet; and

training, by the code annotation computer program, a code annotation model based on

the similarity between the quantified data structure and the smoothing parameter for the Dirichlet prior smoothing estimate.