US 11,886,814 B2
	Systems and methods for deviation detection, information extraction and obligation deviation detection
Sally Gao, Boston, MA (US); Hella-Franziska Hoffmann, London (GB); Nina Hristozova, Zurich (CH); Elizabeth Roman, Somerville, MA (US); Nicolai Pogrebnyakov, Ontario (CA); Yue Feng, Ontario (CA); Masoud Makrehchi, Ontario (CA); Tate Sterling Avery, Ontario (CA); Shohreh Shaghaghian, Ontario (CA); and Borna Jafarpour, Ontario (CA)
Assigned to THOMSON REUTERS ENTERPRISE CENTRE GMBH, Zug (CH)
Filed by Thomson Reuters Enterprise Centre GmbH, Zug (CH)
Filed on Jan. 23, 2021, as Appl. No. 17/156,567.
Claims priority of provisional application 62/975,514, filed on Feb. 12, 2020.
Claims priority of provisional application 62/965,516, filed on Jan. 24, 2020.
Claims priority of provisional application 62/965,523, filed on Jan. 24, 2020.
Claims priority of provisional application 62/965,520, filed on Jan. 24, 2020.
Prior Publication US 2021/0294974 A1, Sep. 23, 2021
Int. Cl. G06F 40/279 (2020.01); G06F 40/242 (2020.01); G06F 3/0481 (2022.01); G06F 40/166 (2020.01); G06F 40/289 (2020.01); G06F 40/258 (2020.01); G06F 40/284 (2020.01); G06F 40/109 (2020.01); G06F 40/137 (2020.01); G06F 40/232 (2020.01); G06V 30/416 (2022.01)

CPC G06F 40/279 (2020.01) [G06F 3/0481 (2013.01); G06F 40/109 (2020.01); G06F 40/137 (2020.01); G06F 40/166 (2020.01); G06F 40/232 (2020.01); G06F 40/242 (2020.01); G06F 40/258 (2020.01); G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06V 30/416 (2022.01)]

7 Claims

1. A method for extracting information, comprising:

receiving an input text;

splitting the input text into n-grams while retaining a case of words as a feature;

for each n-gram, determining whether it is a capitalized concatenated sequence of words and calculating a frequency of the n-gram's appearance in the input text relative to how rarely the n-gram is used in general use;

in response to a first determination that a particular n-gram is a capitalized concatenated sequence of words and a second determination that the particular n-gram has a relative frequency above a predetermined threshold, identifying the particular n-gram as a defined term from the input text;

identifying a definition of each defined term from the input text; and

displaying the definition of a defined term while also displaying a portion of the input text in which the defined term appears but that is different from a portion of the input text identified as the definition of the defined term.