US 12,235,907 B1
Method and system for generating keywords
Kassiani Papasotiriou, New York, NY (US); Tucker Richard Balch, Suwanee, GA (US); Srijan Sood, New York, NY (US); Shayleen Reynolds, New York, NY (US); Lawrence Yong, Harrison, NJ (US); Nilgun Ilayda Akalin, Lincoln Park, NJ (US); and Manuela Veloso, New York, NY (US)
Assigned to JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed by JPMorgan Chase Bank, N.A., New York, NY (US)
Filed on Sep. 21, 2023, as Appl. No. 18/371,283.
Claims priority of application No. 20230100728 (GR), filed on Sep. 11, 2023.
Int. Cl. G06F 16/906 (2019.01); G06F 16/93 (2019.01)
CPC G06F 16/93 (2019.01) [G06F 16/906 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A method for generating keywords, the method comprising:
receiving, by the at least one processor, a corpus that includes a plurality of documents that describe a plurality of entities, wherein each respective one of the plurality of documents included in the corpus includes a description of a respective company;
selecting, by the at least one processor from the corpus, based on at least one predetermined criterion, a group of the plurality of documents, wherein the predetermined criterion relates to a correlation between a group of entities from the plurality of entities, and wherein the group of entities is described in the selected group of the plurality of documents;
determining, by the at least one processor, a first Inverse Document Frequency (IDF) score for each word in a vocabulary of the group of the plurality of documents with respect to the corpus;
determining, by the at least one processor, a second IDF score for each word in the vocabulary with respect to the group of the plurality of documents;
generating, by the at least one processor based on the first IDF score and the second IDF score, a Group Frequency Inverse Document Frequency (GFIDF) score for each word in the vocabulary;
assigning, by the at least one processor based on the generated GFIDF score for each word in the vocabulary, at least one keyword that describes the group of entities; and
displaying the at least one keyword via a graphical user interface (GUI), wherein the at least one keyword identifies a theme among companies within the group of the plurality of documents based on the respective descriptions, and the at least one keyword distinguishes companies within the group from the respective companies in the corpus.