US 11,789,990 B1
Automated splitting of document packages and identification of relevant documents
Meena AbdelMaseeh Adly Fouad, Mississauga (CA); Zhihong Zeng, Acton, MA (US); Anirudh Prabakaran, Atlanta, GA (US); Samriddhi Shakya, Washington, DC (US); Tom Sebastian, Bangalore (IN); Tallam Sai Teja, Andhra Pradesh (IN); Simon Ioffe, Boston, MA (US); and Narasimha Goli, Boston, MA (US)
Assigned to Iron Mountain Incorporated, Boston, MA (US)
Filed by IRON MOUNTAIN INCORPORATED, Boston, MA (US)
Filed on Apr. 29, 2022, as Appl. No. 17/733,581.
Int. Cl. G06F 16/30 (2019.01); G06F 16/35 (2019.01); G06V 30/416 (2022.01); G06F 16/383 (2019.01); G06N 3/045 (2023.01)
CPC G06F 16/35 (2019.01) [G06F 16/383 (2019.01); G06N 3/045 (2023.01); G06V 30/416 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for processing an ordered plurality of document pages, the method comprising:
producing, for each document page of the ordered plurality of document pages, an image of the document page and a representation of text from the document page;
generating, for each document page of the ordered plurality of document pages, and based on the image of the document page and the representation of text from the document page, an embedding of the document page; and
generating, for each document page among a subset of the ordered plurality of document pages, a label for the document page that indicates whether the document page is a document first page,
wherein, for each document page among the subset of the ordered plurality of document pages, generating the label for the document page is based on the embedding of the document page, the embedding of each of at least one document page that precedes the document page in the ordered plurality of document pages, and the embedding of each of at least one document page that follows the document page in the ordered plurality of document pages.