US 12,346,776 B2
Training data augmentation for machine learning
Itay Margolin, Pardesiya (IL)
Assigned to PayPal, Inc., San Jose, CA (US)
Filed by PayPal, Inc., San Jose, CA (US)
Filed on Nov. 19, 2020, as Appl. No. 16/953,030.
Prior Publication US 2022/0156634 A1, May 19, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 21/52 (2013.01); G06F 21/62 (2013.01)
CPC G06N 20/00 (2019.01) [G06F 21/6245 (2013.01); G06F 21/52 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
generating, by a computer system, synthetic samples for a trained machine learning model usable to make a classification decision, wherein the generating includes:
removing, based on a rule specifying a particular feature, the particular feature from a set of existing samples to generate a reduced-feature set of training samples, wherein the removing is performed based on the particular feature failing to comply with the rule, and wherein the particular feature is associated with biased classification decisions in the trained machine learning model;
selecting a subset of the reduced-feature set of training samples having classification decisions that exceed a confidence threshold, wherein the subset includes less training samples than the reduced-feature set of training samples; and
reinserting the particular feature into samples in the selected subset, wherein values for the reinserted particular feature in samples in the selected subset are different than values of the particular feature of corresponding samples in the set of existing samples prior to the removing; and
retraining, by the computer system, the trained machine learning model using the synthetic samples that include new values for the particular feature that is associated with biased classification decisions, wherein the retraining reduces bias in the trained machine learning model; and
executing, by the computer system, the retrained machine learning model to generate unbiased classifications for one or more new samples.