US 11,990,214 B2
Handling form data errors arising from natural language processing
Paul Joseph Hake, Madison, CT (US); Igor S. Ramos, Round Rock, TX (US); Andrew J. Lavery, Austin, TX (US); and Scott Carrier, New Hill, NC (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jul. 21, 2020, as Appl. No. 16/934,061.
Prior Publication US 2022/0028502 A1, Jan. 27, 2022
Int. Cl. G16H 10/60 (2018.01); G06F 16/93 (2019.01); G06F 18/20 (2023.01); G06F 18/214 (2023.01); G06F 18/24 (2023.01); G06F 40/20 (2020.01); G06V 10/22 (2022.01); G06V 30/40 (2022.01); G16H 15/00 (2018.01); G16H 50/70 (2018.01)
CPC G16H 10/60 (2018.01) [G06F 16/93 (2019.01); G06F 18/214 (2023.01); G06F 18/24 (2023.01); G06F 18/285 (2023.01); G06F 40/20 (2020.01); G06V 10/22 (2022.01); G06V 30/40 (2022.01); G16H 15/00 (2018.01); G16H 50/70 (2018.01)] 24 Claims
OG exemplary drawing
 
1. A method comprising:
receiving a document at a processor;
classifying, by the processor, at least a subset of the document as a style of a form of the at least a subset of the document, wherein the style is one of: a binary response form style, a checkbox form style, a circle selection form style, a radio button form style, an underline selection form, a questionnaire, or an unrecognized style of form;
extracting features from the document, the extracting comprising:
initiating processing of the at least a subset of the document by a first processing engine trained to extract features from the style;
initiating processing of a remaining portion of the document not included in the at least a subset of the document by a second processing engine trained to extract features from a non-form-type of data; and
receiving features from one or both of the first processing engine and the second processing engine;
storing, by the processor, the received features as features of the document; and
determining, via the first processing engine or the second processing engine, an accuracy weight of the features of the document based on an identity associated with a data entry of the form.