US 11,948,383 B2
Document classification of files on the client side before upload
William J. Farmer, II, Richmond, VA (US); Sreenidhi Narayanamangalathu Kesavan, Glen Allen, VA (US); Dimitri Bilenkin, Moseley, VA (US); William Clayton Jackson, Glen Allen, VA (US); Karthikeyan Palanivelu, Glen Allen, VA (US); and Siddharth Mangalik, North Grafton, MA (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Apr. 6, 2021, as Appl. No. 17/223,922.
Prior Publication US 2022/0318547 A1, Oct. 6, 2022
Int. Cl. G06V 30/413 (2022.01); G06N 20/00 (2019.01)
CPC G06V 30/413 (2022.01) [G06N 20/00 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A method for classifying a document in real-time, comprising:
identifying a first section of the document likely to contain text based on a comparison of a ratio of dark space to light space in an image of the document to a predetermined threshold;
performing optical character recognition within the identified first section of the document to identify a first set of words within the identified first section of the document;
extracting the first set of words from the identified first section of the document;
selecting a first subset of the first set of words for classifying the document based on a preconfigured option, wherein the preconfigured option comprises at least one of selecting the first subset of the first set of words based on a number of characters in each word of the first set of words, selecting the first subset of the first set of words based on an order of each word of the first set of words, and selecting the first subset of the first set of words randomly;
classifying the document by inputting the selected first subset of the first set of words into one or more machine learning models, wherein the one or more machine learning models determine a classification of the document; and
transmitting the document and the determined classification of the document to an external server.