| CPC G06F 8/75 (2013.01) [G06F 8/427 (2013.01); G06F 17/16 (2013.01)] | 2 Claims |

|
1. A method for automated code analysis and tagging, comprising:
receiving, by a code annotation computer program executed by a computer processor, a training code snippet from a training codebase;
parsing, by the code annotation computer program, the training code snippet into a data structure;
quantifying, by the code annotation computer program, the data structure,
parsing, by the code annotation computer program, a docstring associated with the training code snippet into a plurality of keywords;
quantifying, by the code annotation computer program, the plurality of keywords;
tuning, by the code annotation computer program, a bandwidth of a Gaussian kernel to compute a similarity between the quantified data structures and a smoothing parameter for a Dirichlet prior smoothing estimate, wherein annotations for the training code snippet is generated by matrix multiplication of a test code-training code probability matrix and a training code-word probability matrix, wherein the test code-training code probability matrix is computed from a Gaussian kernel density estimate between each test code snippet and each training code snippet; and
training, by the code annotation computer program, a code annotation model based on
the similarity between the quantified data structure and the smoothing parameter for the Dirichlet prior smoothing estimate.
|