| CPC G06F 40/103 (2020.01) [G06F 16/3329 (2019.01); G06F 40/177 (2020.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)] | 20 Claims |

|
1. A computer-implemented method for extracting tabular data included in a source document, the method comprising:
receiving the source document as an input to a document classifier;
receiving a set of desired keywords provided by a business enterprise;
determining, by the document classifier and in response to receiving the source document, a type of the source document;
identifying, based on the determined type of the source document, a plurality of regions containing the tabular data in the source document, wherein the plurality of regions comprises at least a first region that includes one or more extracted headers and at least a second region that includes values corresponding to the one or more extracted headers;
augmenting the one or more extracted headers and the values with spatial words that describe spatial relationship between the extracted headers and the values;
using a natural language model to answer queries formulated using the spatial words and the augmented one or more extracted headers;
associating values with respective extracted headers using the answers to the queries to generate an output; and
formatting the output, wherein the formatted output presents values associated with one or more desired keywords from the set of desired keywords provided by the business enterprise.
|