CPC G06V 30/416 (2022.01) [G06F 16/93 (2019.01); G06N 3/049 (2013.01); G06N 20/00 (2019.01); G06V 30/412 (2022.01); G06Q 30/04 (2013.01)] | 20 Claims |
1. A computerized system comprising:
one or more processors; and
a non-transitory computer storage memory having computer-executable instructions stored thereon which, when executed by the one or more processors, implement a method comprising:
receiving, at a trained first machine learning model, image data of a multi-page document, the multi-page document including a first document that includes one or more first pages of the multi-page document and a second document that includes one or more second pages of the multi-page document;
in response to the receiving, at the trained first machine learning model, of the image data, generating a first feature vector embedding for the one or more first pages of the multi-page document and a second feature vector embedding for the one or more second pages of the multi-page document, the first and second vector embedding embedded based on learned patterns for document characteristics; and
based on feeding the first feature vector embedding and the second feature vector embedding to a second machine learning model determine whether each page, of the one or more first pages and the one or more second pages, is a continuation page of a previous page of a document or whether each page is a starting page of a new document; and
based on the determining, distinguishing the first document from the second document.
|