CPC G06F 16/93 (2019.01) [G06F 16/906 (2019.01)] | 16 Claims |
1. A method for generating keywords, the method comprising:
receiving, by the at least one processor, a corpus that includes a plurality of documents that describe a plurality of entities, wherein each respective one of the plurality of documents included in the corpus includes a description of a respective company;
selecting, by the at least one processor from the corpus, based on at least one predetermined criterion, a group of the plurality of documents, wherein the predetermined criterion relates to a correlation between a group of entities from the plurality of entities, and wherein the group of entities is described in the selected group of the plurality of documents;
determining, by the at least one processor, a first Inverse Document Frequency (IDF) score for each word in a vocabulary of the group of the plurality of documents with respect to the corpus;
determining, by the at least one processor, a second IDF score for each word in the vocabulary with respect to the group of the plurality of documents;
generating, by the at least one processor based on the first IDF score and the second IDF score, a Group Frequency Inverse Document Frequency (GFIDF) score for each word in the vocabulary;
assigning, by the at least one processor based on the generated GFIDF score for each word in the vocabulary, at least one keyword that describes the group of entities; and
displaying the at least one keyword via a graphical user interface (GUI), wherein the at least one keyword identifies a theme among companies within the group of the plurality of documents based on the respective descriptions, and the at least one keyword distinguishes companies within the group from the respective companies in the corpus.
|