| CPC G06V 30/19147 (2022.01) [G06V 30/18181 (2022.01); G06V 30/19173 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)] | 13 Claims |

|
1. A method to train a machine-learning model to extract key-values from documents, the method comprising:
receiving a collection of training document images;
creating a training data set from the collection; and
training a classification model using the training data set, wherein training the classification model further comprises applying at least a plurality of features to the training data set, the plurality of features comprising:
a first feature corresponding to a location of an n-gram,
a second feature corresponding to a plurality of letter cases,
a third feature corresponding to textual character type,
a fourth feature corresponding to a regular expression,
a fifth feature corresponding to a number of characters used to distinguish between different entities among target classes, and
a sixth feature corresponding to punctuation.
|