US 12,386,867 B2
System and method for rapid initialization and transfer of topic models by a multi-stage approach
Omri Allouche, Tel Aviv (IL); Inbal Horev, Tel Aviv (IL); Eyal Ben David, Kibbutz Yagur (IL); and Adi Kopilov, Tel Aviv (IL)
Assigned to GONG.io Ltd., Ramat Gan (IL)
Filed by GONG.io Ltd., Ramat Gan (IL)
Filed on Jul. 27, 2022, as Appl. No. 17/815,294.
Prior Publication US 2024/0037126 A1, Feb. 1, 2024
Int. Cl. G06F 40/00 (2020.01); G06F 16/31 (2019.01); G06F 40/289 (2020.01); G06F 40/35 (2020.01)
CPC G06F 16/313 (2019.01) [G06F 40/289 (2020.01); G06F 40/35 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A multi-stage method for creating topic models, comprising:
applying a first stage topic model to textual data, wherein the first stage topic model is trained to discover a first plurality of topics and distributions of words in each topic of the first plurality of topics from the textual data;
generating at least one seeded word for a subset of topics of the first plurality of topics, wherein the at least one seeded word is determined based on a plurality of selection rules and the distributions of words in the subset of topics discovered in the first stage topic model, wherein the at least one seeded word defines a topic in the subset of the first plurality of topics;
creating a second stage topic model that discovers a second plurality of topics in new input textual data; and
feeding the generated at least one seeded word to initialize the second stage topic model, wherein the initialization configures the second stage topic model to input the fed at least one seeded word as a dictionary, wherein the at least one seeded word identifies the subset of the first plurality of topics in the new input textual data, wherein the identified subset of the first plurality of topics is part of the second plurality of topics.
 
10. A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising:
applying a first stage topic model to textual data, wherein the first stage topic model is trained to discover a first plurality of topics and distributions of words in each topic of the first plurality of topics from the textual data;
generating at least one seeded word for a subset of the first plurality of topics, wherein the at least one seeded word is determined based on a plurality of selection rules and the distributions of words in the subset discovered in the first stage topic model, wherein the at least one seeded word defines a topic in the subset of the first plurality of topics;
creating a second stage topic model that discovers a second plurality of topics in new input textual data; and
feeding the generated at least one seeded word to initialize the second stage topic model, wherein the initialization configures the second stage topic model to input the fed at least one seeded word as a dictionary, wherein the at least one seeded word identifies the subset of the first plurality of topics in the new input textual data, wherein the identified subset of the first plurality of topics is part of the second plurality of topics.