US 12,287,835 B2
Automatically extracting key-value data included in heterogeneous document types using graph representation learning
Badri Nath, Edison, NJ (US); Vijayendra Mysore Shamanna, Cupertino, CA (US); Yashu Seth, Patna (IN); Ravil Kashyap, Bangalore (IN); Kaushal Kishore Hebbar, Bangalore (IN); Henry Thomas Peter, Mountain House, CA (US); and Simha Sadasiva, San Jose, CA (US)
Assigned to Ushur, Inc., Santa Clara, CA (US)
Filed by Ushur, Inc., Santa Clara, CA (US)
Filed on Jul. 28, 2023, as Appl. No. 18/361,687.
Prior Publication US 2025/0036690 A1, Jan. 30, 2025
Int. Cl. G06F 16/93 (2019.01); G06F 16/903 (2019.01); G06F 16/9038 (2019.01); G06F 16/906 (2019.01); G06N 3/0455 (2023.01)
CPC G06F 16/93 (2019.01) [G06F 16/90344 (2019.01); G06F 16/9038 (2019.01); G06F 16/906 (2019.01); G06N 3/0455 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for extracting key-value pair included in a document, the method comprising:
determining, by a document classifier, a type of a document that is received as an input;
receiving a set of specific keywords desired by a user, wherein the user provides the desired keywords a priori;
representing the document as a plurality of tokens that are spatially distributed in the document;
extracting features of each of the tokens based on the determined document type;
associating one or more tags with each token, each of the tags encoding respective features of the token;
identifying, by a machine-learning technique, keyword tokens and value tokens from the plurality of tokens based on the encoded features;
determining respective affinity score between the keyword tokens and the value tokens based on the encoded features;
responsive to determining that an affinity score between a keyword token and a value token surpasses a predetermined threshold, defining an edge between said keyword token and said value token to indicate key-value relationship; and
presenting an output that shows the key-value relationship for the set of specific keywords desired by the user.