US 12,406,191 B2
Systems and methods for reducing problematic correlations between features from machine learning model data
Kevin Perkins, Champaign, IL (US)
Assigned to Verizon Patent and Licensing Inc., Basking Ridge, NJ (US)
Filed by Verizon Patent and Licensing Inc., Basking Ridge, NJ (US)
Filed on Mar. 22, 2022, as Appl. No. 17/655,838.
Prior Publication US 2023/0306280 A1, Sep. 28, 2023
Int. Cl. G06N 5/022 (2023.01)
CPC G06N 5/022 (2013.01) 20 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving, by a device, a structured dataset, a feature of the structured dataset, and a protected categorical dimension of the structured dataset, wherein the structured dataset includes categorical data;
stratifying, by the device, the structured dataset into subsets based on the protected categorical dimension;
applying, by the device, a quantile transform to data of the subsets, to generate transformed subsets, wherein applying the quantile transform to the data of the subsets, to generate the transformed subsets, comprises:
converting the categorical data to numerical data,
adding random noise to one of the subsets based on a size threshold associated with the subsets, and
automatically selecting a quantity of bins for each of the subsets;
combining, by the device, the transformed subsets to generate a final dataset;
generating, by the device, a training dataset for a machine learning model based on the final dataset;
training, by the device, the machine learning model with the training dataset to generate a trained machine learning model to predict unbiased results based on processing biased data;
executing the trained machine learning model with the final dataset to generate one or more predictions;
determining that a prediction of the one or more predictions is incorrect; and
updating the trained machine learning model based on the incorrect prediction.