CPC G06F 16/93 (2019.01) [G06N 3/08 (2013.01); G06V 30/412 (2022.01); G06V 30/416 (2022.01); G06V 30/418 (2022.01)] | 20 Claims |
1. A method comprising:
creating, by a processor, a database of a plurality of historical documents;
creating, by the processor, cached metadata about the plurality of historical documents including regions of interest of objects, target data types, bag of words, identifier schema and known key-value pairs;
storing, by the processor, the cached metadata about the plurality of historical documents in a relational database to increase computation speed;
receiving, by the processor, a new document having second content and regions of interest;
comparing, by the processor, the second content of the new document to first content in one or more of the plurality of historical documents;
creating, by the processor, match metrics for each of the one or more of a plurality of historical documents, wherein the match metrics are based on a percentage of the second content that matches the first content;
verifying, by the processor, the matching by comparing the match metrics across the plurality of historical documents;
extracting, by the processor, the second content from the regions of interest in the new document based on the matching;
checking, by the processor, the integrity of the second content in the new document against the first content in the one or more of a plurality of historical documents; and
transmitting, by the processor, the second content to a document processing system,
wherein the document processing system automatically further prepares tax documents using the second content.
|