US 11,755,633 B2
Entity search system
Christopher F. Ackermann, Fairfax, VA (US); Charles E. Beller, Baltimore, MD (US); Michael Drzewucki, Woodbridge, VA (US); and Kristen Maria Summers, Takoma Park, MD (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Sep. 28, 2020, as Appl. No. 17/34,190.
Prior Publication US 2022/0100785 A1, Mar. 31, 2022
Int. Cl. G06F 16/33 (2019.01); G06F 40/211 (2020.01)
CPC G06F 16/3344 (2019.01) [G06F 40/211 (2020.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, by one or more processors, a request to search a corpus of documents for an entity, wherein the request includes a non-name identifier of the entity;
identifying, by one or more processors, one or more likely variants of the non-name identifier of the entity, wherein likely variants are determined according to a class of the non-name identifier;
identifying, by one or more processors, entries of text within the corpus of documents that reference any variant of the identified likely variants of the non-name identifier;
applying, by one or more processors, natural language processing (NLP) to content associated with the identified entries within the corpus of documents, wherein the NLP identifies candidate entities associated with any variant of the identified likely variants of the non-name identifier;
determining, by one or more processors, an entity score for each candidate entity, wherein the entity score for a candidate entity is based, at least in part, on an inverse of a sum of the distances between the candidate entity and the references to the non-name identifier in the identified entries;
selecting, by one or more processors, an entity from the candidate entities based, at least in part, on the determined entity scores for the candidate entities, distances between the candidate entities, and references to any variant of the identified likely variants of the non-name identifier in the identified entries; and
returning, by one or more processors, the selected entity to a submitter of the request.