US 11,657,101 B2
Document information extraction system using sequenced comparators
Prabhdeep Singh Walia, Bengaluru (IN); and Vikas Kushwaha, Delhi (IN)
Assigned to Goldman Sachs & Co. LLC, New York, NY (US)
Filed by Goldman Sachs & Co. LLC, New York, NY (US)
Filed on Jan. 13, 2020, as Appl. No. 16/740,754.
Prior Publication US 2021/0216595 A1, Jul. 15, 2021
Int. Cl. G06F 16/93 (2019.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01); G06F 16/904 (2019.01); G06F 40/103 (2020.01)
CPC G06F 16/93 (2019.01) [G06F 16/2246 (2019.01); G06F 16/288 (2019.01); G06F 16/904 (2019.01); G06F 40/103 (2020.01)] 27 Claims
OG exemplary drawing
 
1. A computer-implemented method for determining a hierarchical structure of an electronic document, the method comprising:
segmenting the document into a plurality of elements that, in aggregate, include the hierarchical structure, and each element having one or more visual characteristics and one or more location characteristics;
applying a master comparator including a set of unit comparators to the segmented plurality of elements from the document to determine the hierarchical structure of the document, the master comparator determining the hierarchical structure by:
for each pair of elements in the document:
applying a unit comparator of the set of unit comparators to the pair of elements according to a predefined ordered sequence to generate an output digit, the unit comparator comparing a visual characteristic or a location characteristic of the pair of elements in the document to determine the output digit;
determining a familial relationship between the pair of elements indicated by the output digit;
responsive to the determined familial relationship for the pair of elements being a sibling relationship, applying a next unit comparator of the set of unit comparators to the pair of elements according to the predefined ordered sequence, the next unit comparator comparing a different visual characteristic or a different location characteristic of the pair of elements; and
responsive to the determined familial relationship for the pair of elements being a parent relationship or an unrelated relationship, applying the master comparator to a next pair of elements in the document;
wherein the determined familial relationships between each pair of elements of the plurality of elements identify the hierarchical structure of the document; and
generating, for display on a client device, a visualization of a document hierarchy tree representing the hierarchical structure of the document, the visualization illustrating the determined familial relationships between each pair of elements of the plurality of elements in the document.