CPC G06F 16/2365 (2019.01) [G06N 20/00 (2019.01)] | 20 Claims |
1. A computer-implemented method comprising:
receiving a balancing policy and an imbalanced dataset that comprises samples distributed unequally between different classes;
automatically performing initial adjustment of the imbalanced dataset to comply with the balancing policy, by:
oversampling one or more of the classes which are underrepresented in the imbalanced dataset, and
based on one or more of the classes being overrepresented in the imbalanced dataset, undersampling the one or more overrepresented classes;
operating a generative machine learning model to generate samples for the one or more underrepresented classes, based on the initially-adjusted dataset;
operating a machine learning classification model to label the generated samples with class labels corresponding to the one or more underrepresented classes;
selecting some of the generated samples which, according to the labelling, have a relatively high probability of preserving their class labels, compared to other ones of the generated samples; and
composing a balanced dataset which complies with the balancing policy and comprises:
the samples from the imbalanced dataset belonging to the one or more underrepresented classes,
the selected generated samples, and
based on one or more of the classes being overrepresented in the imbalanced dataset, undersampling the samples belonging to the one or more overrepresented classes in the imbalanced dataset.
|