US 12,266,203 B2
	Multiple input machine learning framework for anomaly detection
Fadoua Khmaissia, Louisville, KY (US); Efraim David Feinstein, San Jose, CA (US); and Preeti Duraipandian, Plano, TX (US)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Oct. 29, 2021, as Appl. No. 17/515,163.
Prior Publication US 2023/0132720 A1, May 4, 2023
Int. Cl. G06V 30/416 (2022.01); G06F 18/2113 (2023.01); G06N 3/088 (2023.01); G06V 30/19 (2022.01)

CPC G06V 30/416 (2022.01) [G06F 18/2113 (2023.01); G06N 3/088 (2013.01); G06V 30/19013 (2022.01)]

15 Claims

1. A method comprising:

for each extraction model of a plurality of extraction models,

generating, by the extraction model, a plurality of key identifier, key value pairs for the extraction model for a possible training document image, and

comparing the plurality of key identifier, key value pairs for the extraction model with a plurality of key identifier, key value pairs in ground truth information defined for the possible training document image to obtain a combined accuracy level for the extraction model;

aggregating the combined accuracy level across the plurality of extraction models to obtain an oracle accuracy for the possible training document image;

selecting the possible training document image as a training document image when the oracle accuracy is greater than an accuracy threshold;

executing an anomaly detection model on a first plurality of features obtained for the training document image to obtain reconstructed input;

determining a reconstruction error based at least in part on the reconstructed input;

updating a plurality of weights of the anomaly detection model using the reconstruction error;

extracting a plurality of image features of a document image of a document, wherein extracting the plurality of image features comprises:

executing a convolutional neural network-based architecture on the document image to obtain an image embedding model feature, wherein the plurality of image features comprises the image embedding model feature;

executing an optical character recognition (OCR) engine on the document image to obtain OCR output;

extracting a plurality of OCR features from the OCR output;

executing, after updating the plurality of weights, the anomaly detection model using a second plurality of features comprising the plurality of OCR features and the plurality of image features to generate an anomaly score; and

presenting the anomaly score.