US 12,277,788 B2
Content extraction based on hop distance within a graph model
Yanfei Dong, Singapore (SG); Yuan Deng, Singapore (SG); Jiazheng Zhang, Singapore (SG); Francesco Gelli, Singapore (SG); Ting Lin, Singapore (SG); Yuzhen Zhuo, Singapore (SG); Hewen Wang, Singapore (SG); and Soujanya Poria, Singapore (SG)
Assigned to PayPal, Inc., San Jose, CA (US)
Filed by PAYPAL, INC., San Jose, CA (US)
Filed on Nov. 9, 2022, as Appl. No. 17/983,908.
Prior Publication US 2024/0153296 A1, May 9, 2024
Int. Cl. G06V 30/19 (2022.01); G06V 10/74 (2022.01); G06V 30/18 (2022.01); G06V 30/414 (2022.01)
CPC G06V 30/1916 (2022.01) [G06V 10/761 (2022.01); G06V 30/18181 (2022.01); G06V 30/414 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a processor; and
a non-transitory computer-readable medium having stored thereon computer-executable instructions that are executable by the processor to cause the system to perform operations comprising:
determining, for each of a plurality of text bounding boxes in a document, respective text, respective coordinates, and respective input embeddings;
generating a graph of the plurality of text bounding boxes according to the respective coordinates of the plurality of text bounding boxes, the graph comprising a plurality of nodes, each node representative of a respective text bounding box, and a plurality of connections, each connection of the plurality of connections comprising a first respective node, a second respective node, and zero or more respective intermediate nodes that are between the first and second respective nodes;
determining a respective attention value for each connection of the plurality of connections according to a respective quantity of intermediate text bounding boxes in the respective connection;
based on the attention values and a transformer-based machine learning model applied to the respective input embeddings and respective coordinates, determining respective output embeddings for each respective text bounding box; and
based on the respective output embeddings, generating a respective bounding box label for each bounding box.