CPC G06F 16/93 (2019.01) [G06F 16/2246 (2019.01); G06F 16/288 (2019.01); G06F 16/904 (2019.01); G06F 40/103 (2020.01)] | 27 Claims |
1. A computer-implemented method for determining a hierarchical structure of an electronic document, the method comprising:
segmenting the document into a plurality of elements that, in aggregate, include the hierarchical structure, and each element having one or more visual characteristics and one or more location characteristics;
applying a master comparator including a set of unit comparators to the segmented plurality of elements from the document to determine the hierarchical structure of the document, the master comparator determining the hierarchical structure by:
for each pair of elements in the document:
applying a unit comparator of the set of unit comparators to the pair of elements according to a predefined ordered sequence to generate an output digit, the unit comparator comparing a visual characteristic or a location characteristic of the pair of elements in the document to determine the output digit;
determining a familial relationship between the pair of elements indicated by the output digit;
responsive to the determined familial relationship for the pair of elements being a sibling relationship, applying a next unit comparator of the set of unit comparators to the pair of elements according to the predefined ordered sequence, the next unit comparator comparing a different visual characteristic or a different location characteristic of the pair of elements; and
responsive to the determined familial relationship for the pair of elements being a parent relationship or an unrelated relationship, applying the master comparator to a next pair of elements in the document;
wherein the determined familial relationships between each pair of elements of the plurality of elements identify the hierarchical structure of the document; and
generating, for display on a client device, a visualization of a document hierarchy tree representing the hierarchical structure of the document, the visualization illustrating the determined familial relationships between each pair of elements of the plurality of elements in the document.
|