US 12,406,183 B2
Accelerated model training from disparate and heterogeneous sources using a meta-database
Peter Councill, Richmond, VA (US); Kenneth William Cluff, Powhatan, VA (US); Glenn Thomas Nofsinger, Reva, VA (US); James Xu, Raleigh, NC (US); and Qing Li, Cary, NC (US)
Assigned to TRUIST BANK, Charlotte, NC (US)
Filed by Truist Bank, Charlotte, NC (US)
Filed on May 12, 2022, as Appl. No. 17/663,122.
Prior Publication US 2023/0368013 A1, Nov. 16, 2023
Int. Cl. G06N 3/08 (2023.01)
CPC G06N 3/08 (2013.01) 20 Claims
OG exemplary drawing
 
1. A system for training a model from a subset of training data representing a plurality of source databases: the system comprising:
a computer including one or more processor and at least one of a memory device and a non-transitory storage device, wherein the one or more processor executes:
a source programing interface configured for interfacing with the plurality of source databases, the plurality of source databases including training data associated with a plurality of training variables, wherein a decentralized storage of the training data results in increased time required to train the model using the training data;
a meta-database programming interface configured for interfacing with a meta-database;
a key variable repository module configured to operably couple the plurality of source databases and the meta-database, the key variable repository module including an artificial intelligence program comprising:
a scanner algorithm configured to perform steps including:
communicate with the source programing interface to receive the training data of the source databases;
compress the training data of the source databases;
communicate with the meta-database programing interface and synchronize the meta-database with the compressed training data of the source databases;
a profiler algorithm configured to perform steps including:
communicate with the meta-database programing interface to receive the training data of the meta-database;
generate, based on the training data of the meta-database, granular data types for at least a portion of the training data of the meta-database;
determine a plurality of training variables indicative of at least a portion of the training data of the meta-database and generate, for each training variable, a probability distribution;
produce at least one association between at least two training variables of the plurality of training variables; and
communicate with the meta-database programing interface to modify the meta-database to include the probability distribution generated for each training variable and the at least one association produced between the at least two training variables; and
a key interface configured to, based on a communicated user input, search the meta-database for at least one of a training variable, a probability distribution for a training variable, or a produced association between training variables; and
a model configured to, using artificial intelligence programing and inference data, generate an inference, wherein the model requires training prior to generating the inference, and wherein using a subset of training data from the source databases and associated with the at least one training variable reduces the time required to train the model to generate the inference.