US 12,298,948 B2
Systems and/or methods for reinforced data cleaning and learning in machine learning inclusive computing environments
Mohamed Osman Mohamed Abdelaal, Stuttgart (DE)
Assigned to SOFTWARE GmbH, Darmstadt (DE)
Filed by SOFTWARE GmbH, Darmstadt (DE)
Filed on Apr. 14, 2023, as Appl. No. 18/134,913.
Prior Publication US 2024/0346002 A1, Oct. 17, 2024
Int. Cl. G06F 16/215 (2019.01)
CPC G06F 16/215 (2019.01) 20 Claims
OG exemplary drawing
 
1. A computer-based method of preparing a dirty dataset for use with an application that leverages a machine learned (ML) model, at least some of the data in the dirty dataset including errors, the method comprising:
storing, to non-transitory memory, a plurality of available computer-implemented repair tools that include different types of repair tools that are separately executable program code;
(a) extracting features from the dirty dataset;
(b) sampling a batch from the dirty dataset;
(c) selecting, via a neural network, a set of one or more computer-implemented repair tools from a plurality of available computer-implemented repair tools, provided that the sampled batch is determined to include at least one error;
(d) executing each one of the computer-implemented repair tools of the selected set, with the sampled batch being used as input to the set of one or more computer-implemented repair tools to generate a repaired sampled batch;
(e) training the ML model based on the repaired sampled batch;
(f) calculating a first loss metric based on performance of the trained ML model and a validation dataset;
(g) adjusting the trained ML model based on the first loss metric
(h) calculating a second loss metric that is based on a difference between the first loss metric of the ML model and a moving average of previous losses;
(i) updating weights of the neural network based on the calculated second loss metric; and
(j) repeating (b)-(i) such that which repair tools are selected via the neural network for the set of one or more computer-implemented repair tools in (c) is modified based on how the weights of the neural network have been updated.