US 12,456,319 B2
	Systems and methods for machine learning key-value extraction on documents
Hu Cao, Cypress, CA (US)
Assigned to TUNGSTEN AUTOMATION CORPORATION, Irvine, CA (US)
Appl. No. 18/018,846
Filed by TUNGSTEN AUTOMATION CORPORATION, Irvine, CA (US)
PCT Filed Jul. 30, 2021, PCT No. PCT/US2021/044030 § 371(c)(1), (2) Date Jan. 30, 2023, PCT Pub. No. WO2022/026908, PCT Pub. Date Feb. 3, 2022.
Claims priority of provisional application 63/059,872, filed on Jul. 31, 2020.
Prior Publication US 2023/0306768 A1, Sep. 28, 2023
Int. Cl. G06V 30/19 (2022.01); G06V 30/18 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)

CPC G06V 30/19147 (2022.01) [G06V 30/18181 (2022.01); G06V 30/19173 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)]

13 Claims

1. A method to train a machine-learning model to extract key-values from documents, the method comprising:

receiving a collection of training document images;

creating a training data set from the collection; and

training a classification model using the training data set, wherein training the classification model further comprises applying at least a plurality of features to the training data set, the plurality of features comprising:

a first feature corresponding to a location of an n-gram,

a second feature corresponding to a plurality of letter cases,

a third feature corresponding to textual character type,

a fourth feature corresponding to a regular expression,

a fifth feature corresponding to a number of characters used to distinguish between different entities among target classes, and

a sixth feature corresponding to punctuation.