US 11,809,500 B2
System and method for finding similar documents based on semantic factual similarity
Mina Farid, Waterloo (CA); Brian Zubert, Waterloo (CA); Lisa Bender, Alma (CA); and Hella-Franziska Hoffmann, London (GB)
Assigned to THOMSON REUTERS ENTERPRISE CENTRE GMBH, Zug (CH)
Filed by Thomson Reuters Enterprise Centre GmbH, Zug (CH)
Filed on Nov. 27, 2017, as Appl. No. 15/822,522.
Claims priority of provisional application 62/426,727, filed on Nov. 28, 2016.
Prior Publication US 2018/0150459 A1, May 31, 2018
Int. Cl. G06F 16/93 (2019.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01); G06F 16/2458 (2019.01); G06F 16/2457 (2019.01); G06F 16/36 (2019.01)
CPC G06F 16/93 (2019.01) [G06F 16/22 (2019.01); G06F 16/2465 (2019.01); G06F 16/24578 (2019.01); G06F 16/285 (2019.01); G06F 16/36 (2019.01); G06F 2216/03 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for finding documents, comprising:
ingesting at least two library documents by extracting and indexing library triples therefrom;
expanding the library triples based on a semantic corpus to obtain expanded library triples;
indexing the expanded library triples while maintaining a record of the library document from which the library triples used to obtain them were extracted;
receiving a reference text string that is not one of the ingested library documents;
extracting one or more reference triples from the reference text string;
expanding at least one reference triple of the one or more reference triples based on a semantic corpus to obtain at least one expanded reference triple, wherein the expanding of the at least one reference triple comprises normalizing one or more tokens of the at least one reference triple to a base form prior to further expansion of the reference triple based on the semantic corpus;
identifying one or more of the library triples and expanded library triples similar to at least one reference triple of the one or more reference triples or at least one expanded reference triple; and
returning a list of one or more result library documents based on the identified library triples and expanded library triples.