US 12,361,738 B2
Machine learning model-agnostic confidence calibration system and method
Sricharan Kallur Palli Kumar, Mountain View, CA (US); Thrathorn Rimchala, San Francisco, CA (US); Hui Chen, Frisco, TX (US); Preeti Duraipandian, Plano, TX (US); and Dominic Miguel Rossi, San Diego, CA (US)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Apr. 30, 2021, as Appl. No. 17/246,383.
Prior Publication US 2022/0351088 A1, Nov. 3, 2022
Int. Cl. G06V 30/41 (2022.01); G06F 18/243 (2023.01); G06N 3/047 (2023.01); G06N 20/20 (2019.01); G06V 30/10 (2022.01); G06V 30/148 (2022.01)
CPC G06V 30/41 (2022.01) [G06F 18/24323 (2023.01); G06N 3/047 (2023.01); G06N 20/20 (2019.01); G06V 30/153 (2022.01); G06V 30/10 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
extracting, from a scanned document by a first software extractor, a first key-value pair comprising a first key and a first value, the first key-value pair corresponding to a first confidence score, wherein the first confidence score is a probability that the first key matches an actual key in the scanned document and that the first value matches an actual value for the actual key;
extracting, from the scanned document by a second software extractor, a second key-value pair comprising the first key and a second value, the second key-value pair corresponding to a second confidence score, wherein the second confidence score is a probability that the first key matches the actual key in the scanned document and that the second value matches the actual value for the actual key, wherein the first key extracted by the second software extractor and the first key extracted by the first software extractor are a same key, and wherein the second software extractor is different than the first software extractor;
generating, from the first key-value pair, a first feature vector comprising a first location feature and the first confidence score, wherein the first location feature is determined from a first location of the first key with respect to a second location of the first value within the scanned document;
generating, from the second key-value pair, a second feature vector comprising a second location feature and the second confidence score, wherein the second location feature is determined from the first location of the first key with respect to a third location of the second value within the scanned document;
classifying, by a classifier and using the first location feature in the first feature vector, the first feature vector to generate a first match probability for the first key-value pair;
classifying, by the classifier and using the second location feature in the second feature vector, the second feature vector to generate a second match probability for the second key-value pair;
generating a first calibrated confidence score corresponding to the first confidence score and a second calibrated confidence score corresponding to the second confidence score by transforming, using precision lookup tables constructed from training records, the first match probability to the first calibrated confidence score and the second match probability to second calibrated confidence score;
selecting, using the first calibrated confidence score and the second calibrated confidence score, one of the first key-value pair and the second key-value pair to obtain a selected key-value pair; and
presenting, in a graphical user interface (GUI), the selected key-value pair.