| CPC G06N 5/022 (2013.01) | 20 Claims |

|
1. A method, comprising:
receiving, by a device, a structured dataset, a feature of the structured dataset, and a protected categorical dimension of the structured dataset, wherein the structured dataset includes categorical data;
stratifying, by the device, the structured dataset into subsets based on the protected categorical dimension;
applying, by the device, a quantile transform to data of the subsets, to generate transformed subsets, wherein applying the quantile transform to the data of the subsets, to generate the transformed subsets, comprises:
converting the categorical data to numerical data,
adding random noise to one of the subsets based on a size threshold associated with the subsets, and
automatically selecting a quantity of bins for each of the subsets;
combining, by the device, the transformed subsets to generate a final dataset;
generating, by the device, a training dataset for a machine learning model based on the final dataset;
training, by the device, the machine learning model with the training dataset to generate a trained machine learning model to predict unbiased results based on processing biased data;
executing the trained machine learning model with the final dataset to generate one or more predictions;
determining that a prediction of the one or more predictions is incorrect; and
updating the trained machine learning model based on the incorrect prediction.
|