US 12,292,784 B2
Systems and methods for evaluating, validating, correcting, and loading data feeds based on artificial intelligence input
Vijaya Kumar Vegulla, Telangana (IN); Rama Venkata S. Kavali, Frisco, TX (US); Venugopala Rao Randhi, Telangana (IN); and Damodarrao Thakkalapelli, Agoura Hills, CA (US)
Assigned to BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed by BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed on Jan. 22, 2024, as Appl. No. 18/418,458.
Application 18/418,458 is a continuation of application No. 17/680,561, filed on Feb. 25, 2022, granted, now 11,947,411.
Prior Publication US 2024/0160512 A1, May 16, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 11/07 (2006.01); G06F 11/34 (2006.01)
CPC G06F 11/0754 (2013.01) [G06F 11/0715 (2013.01); G06F 11/0793 (2013.01); G06F 11/3452 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for evaluating, validating, correcting, and loading data feeds based on artificial intelligence input, the system comprising:
at least one non-transitory storage device; and
at least one processing device coupled to the at least one non-transitory storage device, wherein the at least one processing device is configured to:
receive a data feed from a source for loading to a target data structure;
apply a first machine learning model to the data feed, where the first machine learning model has been trained using first historical data to determine thresholds for likelihoods of data feeds failing to load, wherein the thresholds are determined based on (i) sizes of data feeds associated with higher likelihoods of failing to load and (ii) associations of the data feeds with repeat failure histories, and wherein the first historical data comprises characteristics of historical data feeds and outcomes of loading the historical data feeds;
determine, using the first machine learning model and based on characteristics of the data feed, a threshold for a likelihood of the data feed failing to load;
analyze, based on historical feed data, metadata of the data feed to determine a likelihood of the data feed failing to load;
determine whether the likelihood of the data feed failing to load satisfies the threshold;
determine whether the data feed is associated with a recent rejection of another data feed based on characteristics of the data feed, characteristics of historical data feeds, the source from which the data feed was received, and the target data structure to which the data feed is to be loaded;
load, based on determining that the likelihood of the data feed failing to load satisfies the threshold and based on determining that the data feed is not associated with the recent rejection of the other data feed, the data feed to the target data structure;
determine, after loading the data feed to the target data structure, whether the data feed failed to load;
based on determining that the data feed failed to load, iteratively and until each subdivided data feed loads or is added to a failed data log:
apply a second machine learning model trained using second historical data to determine numbers of parts into which to split data feeds to have shortest times for at least one of successfully identifying errors or successfully loading the data feeds, wherein splitting the data feeds comprises, for a given data feed, dividing a total number of rows in the given data feed by a number N of parts and splitting the given data feed into the number N of subdivided data feeds each having the same number of rows;
determine, using the second machine learning model, a number N of parts into which to split the data feed;
subdivide, into N subdivided data feeds, the data feed or each of the subdivided data feeds that fails to load and that is not a single data row;
load each of the subdivided data feeds to the target data structure; and
for each of the subdivided data feeds that fails to load, correct an error in the subdivided data feed or add the subdivided data feed to the failed data log;
transmit, to a user device associated with the source, the failed data log; and
continuously retrain the first machine learning model and the second machine learning model using data and metadata associated with loaded data feeds, subdividing the loaded data feeds, correcting errors in the loaded data feeds, outcomes of attempts to load subdivided parts of the loaded data feeds, and outcomes of attempts to load loaded data feeds including corrected errors.