US 11,709,867 B2
Categorization of document content based on entity relationships
Jeb R. Linton, Manassas, VA (US); Rhonda L. Childress, Austin, TX (US); George E. Stark, Lakeway, TX (US); and Charles Palmer, Hanover, NH (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 28, 2017, as Appl. No. 15/823,736.
Prior Publication US 2019/0163804 A1, May 30, 2019
Int. Cl. G06F 16/28 (2019.01); G06F 16/93 (2019.01); G06F 16/951 (2019.01); G06F 16/35 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/288 (2019.01); G06F 16/353 (2019.01); G06F 16/93 (2019.01); G06F 16/951 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining a document of written content, wherein the document is a candidate for inclusion in a corpus;
identifying a first entity associated with the document;
identifying a first discrete entity associated with the first entity;
analyzing a relationship between the first entity and the first discrete entity, wherein the analyzing comprises vectorizing entity-relationship information related to the first entity and processing the vectorized information in a neural network;
determining, based on the analyzing, a likelihood that the document contains content that would be detrimental for inclusion in the corpus; and
rejecting, based on the analyzing and the likelihood, the document from the corpus.