| CPC G06N 3/0475 (2023.01) [G06F 40/279 (2020.01)] | 20 Claims |

|
1. A method for maintaining freshness in a Latent Dirichlet Allocation (LDA) model, the method being implemented by at least one processor, the method comprising:
receiving, by the at least one processor, a first corpus that includes a first set of textual data;
generating, by the at least one processor, a first set of LDA topics by applying a first LDA model to the first corpus;
generating, by the at least one processor for each respective topic from among the first set of LDA topics, a corresponding vector of words with associated weights;
measuring, by the at least one processor, a cosine similarity of at least a first topic from among the first set of LDA topics with respect to at least one topic from among a second set of LDA topics that is associated with a second LDA model;
determining, based on a result of the measuring, a topic match entropy between the first LDA model and the second LDA model; and
when the topic match entropy exceeds a predetermined threshold, determining that the first LDA model has drifted and requires retraining,
wherein the topic match entropy is determined by applying an expression of Topic Match Entropy=(1st Cos D)/(1−(ABS|1st Cos D−2nd Cos D|)), where the 1st Cos D is equal to a first minimum cosine distance between Topic Ti and Topic Tj, and the 2nd Cos D is equal to a second minimum cosine distance between Topic Ti and Topic Tj.
|