US 12,481,866 B2
Method and system for maintaining latent Dirichlet allocation model accuracy
Deepanjan Chakraborti, Kolkata (IN); Luke Meier, Frisco, TX (US); Brandon Chihkai Yang, Chicago, IL (US); and Maria Beltran, Chicago, IL (US)
Assigned to JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed by JPMorgan Chase Bank, N.A., New York, NY (US)
Filed on Apr. 10, 2023, as Appl. No. 18/132,661.
Claims priority of application No. 202311012684 (IN), filed on Feb. 24, 2023.
Prior Publication US 2024/0289595 A1, Aug. 29, 2024
Int. Cl. G06F 40/30 (2020.01); G06F 40/279 (2020.01); G06N 3/0475 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/0475 (2023.01) [G06F 40/279 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method for maintaining freshness in a Latent Dirichlet Allocation (LDA) model, the method being implemented by at least one processor, the method comprising:
receiving, by the at least one processor, a first corpus that includes a first set of textual data;
generating, by the at least one processor, a first set of LDA topics by applying a first LDA model to the first corpus;
generating, by the at least one processor for each respective topic from among the first set of LDA topics, a corresponding vector of words with associated weights;
measuring, by the at least one processor, a cosine similarity of at least a first topic from among the first set of LDA topics with respect to at least one topic from among a second set of LDA topics that is associated with a second LDA model;
determining, based on a result of the measuring, a topic match entropy between the first LDA model and the second LDA model; and
when the topic match entropy exceeds a predetermined threshold, determining that the first LDA model has drifted and requires retraining,
wherein the topic match entropy is determined by applying an expression of Topic Match Entropy=(1st Cos D)/(1−(ABS|1st Cos D−2nd Cos D|)), where the 1st Cos D is equal to a first minimum cosine distance between Topic Ti and Topic Tj, and the 2nd Cos D is equal to a second minimum cosine distance between Topic Ti and Topic Tj.