US 11,860,950 B2
Document matching and data extraction
David A. Wyle, Corona Del Mar, CA (US); Alexander James Sadovsky, Denver, CO (US); and William W. Hosek, Laguna Niguel, CA (US)
Assigned to SUREPREP, LLC, Irvine, CA (US)
Filed by Sureprep, LLC, Irvine, CA (US)
Filed on Mar. 30, 2021, as Appl. No. 17/217,917.
Prior Publication US 2022/0318315 A1, Oct. 6, 2022
Int. Cl. G06F 16/93 (2019.01); G06N 3/08 (2023.01); G06V 30/412 (2022.01); G06V 30/416 (2022.01); G06V 30/418 (2022.01)
CPC G06F 16/93 (2019.01) [G06N 3/08 (2013.01); G06V 30/412 (2022.01); G06V 30/416 (2022.01); G06V 30/418 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
creating, by a processor, a database of a plurality of historical documents;
creating, by the processor, cached metadata about the plurality of historical documents including regions of interest of objects, target data types, bag of words, identifier schema and known key-value pairs;
storing, by the processor, the cached metadata about the plurality of historical documents in a relational database to increase computation speed;
receiving, by the processor, a new document having second content and regions of interest;
comparing, by the processor, the second content of the new document to first content in one or more of the plurality of historical documents;
creating, by the processor, match metrics for each of the one or more of a plurality of historical documents, wherein the match metrics are based on a percentage of the second content that matches the first content;
verifying, by the processor, the matching by comparing the match metrics across the plurality of historical documents;
extracting, by the processor, the second content from the regions of interest in the new document based on the matching;
checking, by the processor, the integrity of the second content in the new document against the first content in the one or more of a plurality of historical documents; and
transmitting, by the processor, the second content to a document processing system,
wherein the document processing system automatically further prepares tax documents using the second content.