CPC G06N 5/022 (2013.01) | 12 Claims |
1. A system comprising:
at least one hardware processor; and
at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:
causing presentation of a user interface for training a select machine-learning model, the select machine-learning model being configured to extract values for one or more data points from electronic documents;
adding, by the user interface, a set of data points and a set of questions, the set of questions corresponding to the set of data points;
using the select machine-learning model to extract, from an uploaded electronic document, at least a set of values for the set of data points based on the set of questions;
causing presentation of the set of data points, the set of values, and the set of questions in the user interface;
receiving, by the user interface, user feedback with respect to one or more of the set of values;
performing a training process on the select machine-learning model to generate a custom machine-learning model from the select machine-learning model based on the user feedback and the uploaded electronic document;
publishing the custom machine-learning model as a database object on a data platform for use in extracting values for at least the set of data points from one or more electronic documents;
receiving a database command that generates a document information extraction pipeline based on the database object; and
in response to the database command, generating the document information extraction pipeline on the data platform based on the database object, the document information extraction pipeline comprising a software service that is continuously running on the data platform and that is configured to perform operations comprising:
monitoring for a set of input electronic documents;
receiving the set of input electronic documents;
using the custom machine-learning model of the database object to extract a set of extracted values from each input electronic document in the set of input electronic documents; and
storing each set of extracted values in a target table on the data platform, the target table being specified by the database command.
|