US 12,456,319 B2
Systems and methods for machine learning key-value extraction on documents
Hu Cao, Cypress, CA (US)
Assigned to TUNGSTEN AUTOMATION CORPORATION, Irvine, CA (US)
Appl. No. 18/018,846
Filed by TUNGSTEN AUTOMATION CORPORATION, Irvine, CA (US)
PCT Filed Jul. 30, 2021, PCT No. PCT/US2021/044030
§ 371(c)(1), (2) Date Jan. 30, 2023,
PCT Pub. No. WO2022/026908, PCT Pub. Date Feb. 3, 2022.
Claims priority of provisional application 63/059,872, filed on Jul. 31, 2020.
Prior Publication US 2023/0306768 A1, Sep. 28, 2023
Int. Cl. G06V 30/19 (2022.01); G06V 30/18 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)
CPC G06V 30/19147 (2022.01) [G06V 30/18181 (2022.01); G06V 30/19173 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)] 13 Claims
OG exemplary drawing
 
1. A method to train a machine-learning model to extract key-values from documents, the method comprising:
receiving a collection of training document images;
creating a training data set from the collection; and
training a classification model using the training data set, wherein training the classification model further comprises applying at least a plurality of features to the training data set, the plurality of features comprising:
a first feature corresponding to a location of an n-gram,
a second feature corresponding to a plurality of letter cases,
a third feature corresponding to textual character type,
a fourth feature corresponding to a regular expression,
a fifth feature corresponding to a number of characters used to distinguish between different entities among target classes, and
a sixth feature corresponding to punctuation.