| CPC G06F 16/215 (2019.01) | 20 Claims |

|
1. A computer-based method of preparing a dirty dataset for use with an application that leverages a machine learned (ML) model, at least some of the data in the dirty dataset including errors, the method comprising:
storing, to non-transitory memory, a plurality of available computer-implemented repair tools that include different types of repair tools that are separately executable program code;
(a) extracting features from the dirty dataset;
(b) sampling a batch from the dirty dataset;
(c) selecting, via a neural network, a set of one or more computer-implemented repair tools from a plurality of available computer-implemented repair tools, provided that the sampled batch is determined to include at least one error;
(d) executing each one of the computer-implemented repair tools of the selected set, with the sampled batch being used as input to the set of one or more computer-implemented repair tools to generate a repaired sampled batch;
(e) training the ML model based on the repaired sampled batch;
(f) calculating a first loss metric based on performance of the trained ML model and a validation dataset;
(g) adjusting the trained ML model based on the first loss metric
(h) calculating a second loss metric that is based on a difference between the first loss metric of the ML model and a moving average of previous losses;
(i) updating weights of the neural network based on the calculated second loss metric; and
(j) repeating (b)-(i) such that which repair tools are selected via the neural network for the set of one or more computer-implemented repair tools in (c) is modified based on how the weights of the neural network have been updated.
|