US 12,406,183 B2
	Accelerated model training from disparate and heterogeneous sources using a meta-database
Peter Councill, Richmond, VA (US); Kenneth William Cluff, Powhatan, VA (US); Glenn Thomas Nofsinger, Reva, VA (US); James Xu, Raleigh, NC (US); and Qing Li, Cary, NC (US)
Assigned to TRUIST BANK, Charlotte, NC (US)
Filed by Truist Bank, Charlotte, NC (US)
Filed on May 12, 2022, as Appl. No. 17/663,122.
Prior Publication US 2023/0368013 A1, Nov. 16, 2023
Int. Cl. G06N 3/08 (2023.01)

CPC G06N 3/08 (2013.01)

20 Claims

1. A system for training a model from a subset of training data representing a plurality of source databases: the system comprising:

a computer including one or more processor and at least one of a memory device and a non-transitory storage device, wherein the one or more processor executes:

a source programing interface configured for interfacing with the plurality of source databases, the plurality of source databases including training data associated with a plurality of training variables, wherein a decentralized storage of the training data results in increased time required to train the model using the training data;

a meta-database programming interface configured for interfacing with a meta-database;

a key variable repository module configured to operably couple the plurality of source databases and the meta-database, the key variable repository module including an artificial intelligence program comprising:

a scanner algorithm configured to perform steps including:

communicate with the source programing interface to receive the training data of the source databases;

compress the training data of the source databases;

communicate with the meta-database programing interface and synchronize the meta-database with the compressed training data of the source databases;

a profiler algorithm configured to perform steps including:

communicate with the meta-database programing interface to receive the training data of the meta-database;

generate, based on the training data of the meta-database, granular data types for at least a portion of the training data of the meta-database;

determine a plurality of training variables indicative of at least a portion of the training data of the meta-database and generate, for each training variable, a probability distribution;

produce at least one association between at least two training variables of the plurality of training variables; and

communicate with the meta-database programing interface to modify the meta-database to include the probability distribution generated for each training variable and the at least one association produced between the at least two training variables; and

a key interface configured to, based on a communicated user input, search the meta-database for at least one of a training variable, a probability distribution for a training variable, or a produced association between training variables; and

a model configured to, using artificial intelligence programing and inference data, generate an inference, wherein the model requires training prior to generating the inference, and wherein using a subset of training data from the source databases and associated with the at least one training variable reduces the time required to train the model to generate the inference.