| CPC G06F 16/93 (2019.01) [G06F 16/90344 (2019.01); G06F 16/9038 (2019.01); G06F 16/906 (2019.01); G06N 3/0455 (2023.01)] | 20 Claims |

|
1. A computer-implemented method for extracting key-value pair included in a document, the method comprising:
determining, by a document classifier, a type of a document that is received as an input;
receiving a set of specific keywords desired by a user, wherein the user provides the desired keywords a priori;
representing the document as a plurality of tokens that are spatially distributed in the document;
extracting features of each of the tokens based on the determined document type;
associating one or more tags with each token, each of the tags encoding respective features of the token;
identifying, by a machine-learning technique, keyword tokens and value tokens from the plurality of tokens based on the encoded features;
determining respective affinity score between the keyword tokens and the value tokens based on the encoded features;
responsive to determining that an affinity score between a keyword token and a value token surpasses a predetermined threshold, defining an edge between said keyword token and said value token to indicate key-value relationship; and
presenting an output that shows the key-value relationship for the set of specific keywords desired by the user.
|