CPC G06V 30/19153 (2022.01) | 20 Claims |
1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by a processor, perform a method for applying machine learning techniques to classify and validate documents based on one or more expense rule sets and one or more external data validation services, the method comprising:
receiving, by a document classification service, one or more document images associated with expenses incurred in connection with a reimbursable event;
for each received document image in the one or more document images:
transmitting image data associated with the received document image to an optical character recognition image processor, the optical character recognition image processor configured to recognize textual contents and coordinates associated with graphical and textual information contained within the received document image;
receiving optical character recognition data from the optical character recognition image processor, wherein the optical character recognition data comprises the textual contents and coordinates associated with graphical and textual information;
transmitting the optical character recognition data to a text tokenizer;
receiving tokenized text from the text tokenizer, wherein the tokenized text comprises text entities corresponding to expense details associated with the expenses incurred in connection with a reimbursable event;
transmitting the tokenized text and the coordinates associated with graphical and textual information to a text feature generator;
receiving one or more text feature vectors from the text feature generator;
transmitting the one or more text feature vectors to a document classifier;
receiving a document classification from the document classifier, wherein the document classifier employs a document classification machine learning model that is trained on previously validated documents;
extracting extracted document fields from the one or more text feature vectors based on a document extraction machine learning model and the one or more expense rule sets;
based on the extracted document fields and the document classification, validating a document based on the received document image, the received document classification, and the one or more data validation services to produce a validation result; and
based on the validation result, automatically generating a reimbursement instruction corresponding to the received document classification and the extracted document fields.
|