CPC G06F 8/75 (2013.01) [G06F 8/427 (2013.01); G06F 17/16 (2013.01)] | 18 Claims |
1. A method for automated code analysis and tagging, comprising:
receiving, by a code annotation computer program executed by a computer processor, a training code snippet from a training codebase;
parsing, by the code annotation computer program, the training code snippet into a data structure;
quantifying, by the code annotation computer program, the data structure,
parsing, by the code annotation computer program, a docstring associated with the training code snippet into a plurality of keywords;
quantifying, by the code annotation computer program, the plurality of keywords;
training, by the code annotation computer program, a code annotation model based on a similarity between the quantified data structure and a smoothing parameter for a Dirichlet prior smoothing estimate; and
generating an annotation for the training code snippet by matrix multiplication of a test code-training probability matrix and a training code-word probability matrix, the test code-training code probability matrix computed from a Gaussian kernel density estimate between each test code snippet and each training code snippet from the training codebase.
|