US 12,236,355 B2
	Generating machine-learning model for document extraction
Michal Gdak, Warsaw (PL); Ganeshan Ramachandran Iyer, Redmond, WA (US); Tomasz Malisz, Bialystok (PL); Mikolaj Niedbala, Poznan (PL); Pawel Pollak, Warsaw (PL); Saurin Shah, Kirkland, WA (US); Jan Tomasz Topinski, Izabelin (PL); and Daria Wieteska, Warsaw (PL)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Jan. 18, 2024, as Appl. No. 18/416,379.
Application 18/416,379 is a continuation of application No. 18/472,883, filed on Sep. 22, 2023, granted, now 11,922,328.
Claims priority of provisional application 63/495,174, filed on Apr. 10, 2023.
Prior Publication US 2024/0338577 A1, Oct. 10, 2024
Int. Cl. G06N 20/00 (2019.01); G06N 5/022 (2023.01)

CPC G06N 5/022 (2013.01)

12 Claims

1. A system comprising:

at least one hardware processor; and

at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:

causing presentation of a user interface for training a select machine-learning model, the select machine-learning model being configured to extract values for one or more data points from electronic documents;

adding, by the user interface, a set of data points and a set of questions, the set of questions corresponding to the set of data points;

using the select machine-learning model to extract, from an uploaded electronic document, at least a set of values for the set of data points based on the set of questions;

causing presentation of the set of data points, the set of values, and the set of questions in the user interface;

receiving, by the user interface, user feedback with respect to one or more of the set of values;

performing a training process on the select machine-learning model to generate a custom machine-learning model from the select machine-learning model based on the user feedback and the uploaded electronic document;

publishing the custom machine-learning model as a database object on a data platform for use in extracting values for at least the set of data points from one or more electronic documents;

receiving a database command that generates a document information extraction pipeline based on the database object; and

in response to the database command, generating the document information extraction pipeline on the data platform based on the database object, the document information extraction pipeline comprising a software service that is continuously running on the data platform and that is configured to perform operations comprising:

monitoring for a set of input electronic documents;

receiving the set of input electronic documents;

using the custom machine-learning model of the database object to extract a set of extracted values from each input electronic document in the set of input electronic documents; and

storing each set of extracted values in a target table on the data platform, the target table being specified by the database command.