CPC G06V 30/414 (2022.01) [G06V 10/50 (2022.01); G06V 10/778 (2022.01); G06V 30/18086 (2022.01)] | 20 Claims |
1. A method for processing a document having one or more pages, comprising:
receiving an unstructured document;
recognizing a plurality of textual blocks on at least a portion of a page of the unstructured document;
generating a plurality of bounding boxes, each bounding box surrounding and corresponding to one of the plurality of textual blocks and having coordinates of a plurality of vertices;
determining a plurality of search paths, each search path having coordinates of two endpoints and connecting at least two bounding boxes; and
generating a graph representation of the at least a portion of the page, the graph representation including the plurality of textual blocks, the coordinates of the plurality of vertices of each bounding box and the coordinates of the two endpoints of each search path.
|