US 12,443,676 B2
Controlling a bias of a machine learning module background
Lukasz G. Cmielowski, Cracow (PL); Szymon Kucharczyk, Cracow (PL); Dorota Lączak, Mielec (PL); and Daniel Jakub Ryszka, Cracow (PL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Oct. 13, 2021, as Appl. No. 17/450,694.
Prior Publication US 2023/0115067 A1, Apr. 13, 2023
Int. Cl. G06N 20/00 (2019.01); G06F 18/211 (2023.01); G06F 18/2113 (2023.01); G06F 18/214 (2023.01); G06F 18/40 (2023.01); G06N 3/08 (2023.01)
CPC G06F 18/211 (2023.01) [G06F 18/40 (2023.01); G06N 20/00 (2019.01)] 10 Claims
OG exemplary drawing
 
1. A computer-implemented method, the method comprising:
receiving a plurality of data records from a plurality of networked entities, wherein each data record comprises a value corresponding to a first feature, a value corresponding to a second feature, and a target value;
selecting either the first feature or the second feature as a selected feature;
updating each data record with a generated further feature, wherein the further feature is calculated as either: a weighted ratio of the first feature to the second feature, or a logarithmic function of the first feature;
calculating a first correlation metric as a measure of a correlation between the selected feature and the further feature;
training a machine learning module (ML-module) until a training error is below a training error threshold, wherein the training comprises repeatedly:
inputting each updated data record as training data to the ML-module, resulting in a respective change of a value of each model parameter, a training error, and training output values, each training output value corresponding to each input further feature;
sorting the values of the generated further feature in ascending order, and dividing the sorted values of the generated further feature into a first subset dataset and a second subset dataset;
determining a bias metric indicating a strength of a bias of the ML-module toward the first subset dataset or the second subset dataset, based on the correlation metric being greater than a threshold; and
releasing the ML-module for usage if the determined bias metric satisfies a bias constraint, wherein the releasing comprises storing the model parameters and structural parameters of the ML-module, storing the correlation metrics, and bias metrics in a form of meta data of the ML-module.