US 11,907,816 B2
Entropy based synthetic data generation for augmenting classification system training data
Pinkesh Badjatiya, Ujjain (IN); Nikaash Puri, New Delhi (IN); Ayush Chopra, Pitampura (IN); and Anubha Kabra, Jaipur (IN)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Aug. 22, 2022, as Appl. No. 17/892,878.
Application 17/892,878 is a continuation of application No. 16/659,147, filed on Oct. 21, 2019, granted, now 11,423,264.
Prior Publication US 2023/0196191 A1, Jun. 22, 2023
Int. Cl. G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06F 18/2431 (2023.01); G06F 18/211 (2023.01); G06F 18/214 (2023.01); G06F 18/2453 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 18/211 (2023.01); G06F 18/214 (2023.01); G06F 18/2431 (2023.01); G06F 18/2453 (2023.01); G06N 20/10 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
selecting, based on entropies of at least two training instances in a training data set used to train a data classification system, a first training instance from a first subset of the training data set that includes multiple training instances that are in a minority class and a second training instance from a second subset of the training data set that includes multiple training instances that are in a majority class;
generating a synthetic training instance by combining the first training instance and the second training instance;
generating a synthetic training label for the synthetic training instance based on a training label for the first training instance and a training label for the second training instance;
augmenting the training data set with the synthetic training instance and the synthetic training label, resulting in an augmented training data set; and
re-training the data classification system by applying the augmented training data set to the data classification system and adjusting weights within the data classification system based on the augmented training data set.