CPC G06V 30/19173 (2022.01) [G06V 30/1448 (2022.01); G06V 30/1463 (2022.01); G06V 30/1801 (2022.01); G06V 30/26 (2022.01); G06V 30/413 (2022.01)] | 20 Claims |
1. A method comprising:
receiving an image of a document;
preprocessing the image to align individual pages of the document with an upright vectors;
generating first machine readable content based at least in part on a first optical character recognition system, the first machine readable content representing the document;
generating second machine readable content based at least in part on a second optical character recognition system, the second machine readable content representing the document;
generating third machine readable content based at least in part on the first machine readable content and the second machine readable content;
generating a first classification for the document based at least in part on the third machine readable content and a first classification system;
generating a second classification for the document based at least in part on the third machine readable content and a second classification system;
generating an assigned classification for the document based at least in part on the first classification and the second classification;
generating extracted data from the third machine readable content, the extracted data associated with one or more key value descriptors assigned based at least in part on the assigned classification;
generating fourth machine readable content by removing content associated with the extracted data from the third machine readable content; and
generating additional extracted data from the fourth machine readable content, the additional extracted data associated with the one or more of the key value descriptors assigned.
|