| CPC G06F 16/313 (2019.01) [G06F 40/289 (2020.01); G06F 40/35 (2020.01)] | 19 Claims |

|
1. A multi-stage method for creating topic models, comprising:
applying a first stage topic model to textual data, wherein the first stage topic model is trained to discover a first plurality of topics and distributions of words in each topic of the first plurality of topics from the textual data;
generating at least one seeded word for a subset of topics of the first plurality of topics, wherein the at least one seeded word is determined based on a plurality of selection rules and the distributions of words in the subset of topics discovered in the first stage topic model, wherein the at least one seeded word defines a topic in the subset of the first plurality of topics;
creating a second stage topic model that discovers a second plurality of topics in new input textual data; and
feeding the generated at least one seeded word to initialize the second stage topic model, wherein the initialization configures the second stage topic model to input the fed at least one seeded word as a dictionary, wherein the at least one seeded word identifies the subset of the first plurality of topics in the new input textual data, wherein the identified subset of the first plurality of topics is part of the second plurality of topics.
|
|
10. A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising:
applying a first stage topic model to textual data, wherein the first stage topic model is trained to discover a first plurality of topics and distributions of words in each topic of the first plurality of topics from the textual data;
generating at least one seeded word for a subset of the first plurality of topics, wherein the at least one seeded word is determined based on a plurality of selection rules and the distributions of words in the subset discovered in the first stage topic model, wherein the at least one seeded word defines a topic in the subset of the first plurality of topics;
creating a second stage topic model that discovers a second plurality of topics in new input textual data; and
feeding the generated at least one seeded word to initialize the second stage topic model, wherein the initialization configures the second stage topic model to input the fed at least one seeded word as a dictionary, wherein the at least one seeded word identifies the subset of the first plurality of topics in the new input textual data, wherein the identified subset of the first plurality of topics is part of the second plurality of topics.
|