CPC G06V 30/1916 (2022.01) [G06V 10/761 (2022.01); G06V 30/18181 (2022.01); G06V 30/414 (2022.01)] | 20 Claims |
1. A system, comprising:
a processor; and
a non-transitory computer-readable medium having stored thereon computer-executable instructions that are executable by the processor to cause the system to perform operations comprising:
determining, for each of a plurality of text bounding boxes in a document, respective text, respective coordinates, and respective input embeddings;
generating a graph of the plurality of text bounding boxes according to the respective coordinates of the plurality of text bounding boxes, the graph comprising a plurality of nodes, each node representative of a respective text bounding box, and a plurality of connections, each connection of the plurality of connections comprising a first respective node, a second respective node, and zero or more respective intermediate nodes that are between the first and second respective nodes;
determining a respective attention value for each connection of the plurality of connections according to a respective quantity of intermediate text bounding boxes in the respective connection;
based on the attention values and a transformer-based machine learning model applied to the respective input embeddings and respective coordinates, determining respective output embeddings for each respective text bounding box; and
based on the respective output embeddings, generating a respective bounding box label for each bounding box.
|