US 12,217,875 B1
Feature prediction for minority class data augmentation
Jonathan Mugan, Buda, TX (US); and Mallika Thanky, Chicago, IL (US)
Assigned to Pulselight Holdings, Inc., Austin, TX (US)
Filed by Pulselight Holdings, Inc., Austin, TX (US)
Filed on Oct. 31, 2022, as Appl. No. 17/978,009.
Application 17/978,009 is a continuation of application No. 16/430,035, filed on Jun. 3, 2019, granted, now 11,488,723.
Claims priority of provisional application 62/680,431, filed on Jun. 4, 2018.
This patent is subject to a terminal disclaimer.
Int. Cl. G16H 10/60 (2018.01); G06F 17/18 (2006.01); G16B 40/00 (2019.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01)
CPC G16H 50/70 (2018.01) [G06F 17/18 (2013.01); G16B 40/00 (2019.02); G16H 10/60 (2018.01); G16H 50/20 (2018.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method of generating synthetic minority-class training records for machine learning, the method performed by a computer system, said computer system comprising one or more processors and computer-usable non-transitory storage media operationally coupled to the one or more processors, comprising:
storing in the non-transitory storage media a plurality of original minority-class training records, including a first minority-class training record, wherein each of the plurality of original minority-class training records is labeled with a same first label and comprises a feature value for each of a plurality of features, including a first feature, and wherein the first minority-class training record comprises a first feature value for the first feature;
using a computational process performed by the one or more processors executing software instructions stored in the computer-usable non-transitory storage media, determining that the probability of the first feature having a different second feature value in the first minority-class training record exceeds a pre-determined probability threshold; and
generating a first synthetic minority-class training record from the first minority-class training record, comprising changing the feature value of the first feature in the first minority-class training record from the first feature value to the second feature value, and storing the modified version of the first minority-class training record as the first synthetic minority-class training record in the non-transitory storage media, thereby augmenting the plurality of original minority-class training records with the first synthetic minority-class training record.