CPC G06V 30/413 (2022.01) [G06N 20/00 (2019.01)] | 19 Claims |
1. A method for classifying a document in real-time, comprising:
identifying a first section of the document likely to contain text based on a comparison of a ratio of dark space to light space in an image of the document to a predetermined threshold;
performing optical character recognition within the identified first section of the document to identify a first set of words within the identified first section of the document;
extracting the first set of words from the identified first section of the document;
selecting a first subset of the first set of words for classifying the document based on a preconfigured option, wherein the preconfigured option comprises at least one of selecting the first subset of the first set of words based on a number of characters in each word of the first set of words, selecting the first subset of the first set of words based on an order of each word of the first set of words, and selecting the first subset of the first set of words randomly;
classifying the document by inputting the selected first subset of the first set of words into one or more machine learning models, wherein the one or more machine learning models determine a classification of the document; and
transmitting the document and the determined classification of the document to an external server.
|