US 12,406,188 B1
	System and method for evolved data augmentation and selection
Santiago Gonzalez, Denver, CO (US); Jason Zhi Liang, Fremont, CA (US); and Risto Miikkulainen, Stanford, CA (US)
Assigned to Cognizant Technology Solutions U.S. Corportion, College Station, TX (US)
Filed by Cognizant Technology Solutions U.S. Corporation, College Station, TX (US)
Filed on Mar. 5, 2021, as Appl. No. 17/193,812.
Claims priority of provisional application 62/987,138, filed on Mar. 9, 2020.
Int. Cl. G06N 3/086 (2023.01); G06F 16/215 (2019.01); G06N 3/04 (2023.01)

CPC G06N 3/086 (2013.01) [G06F 16/215 (2019.01); G06N 3/04 (2013.01)]

12 Claims

1. A process for evolving a data augmentation policy for application to sample data from a dataset, wherein the sample data is used to train a deep neural network (DNN) to perform a predetermined task, the process comprising:

evolving an initial population of candidate data augmentation policy models, wherein each initial candidate data augmentation policy model includes multiple nodes and multiple edges and further wherein each node represents a single distinct data augmentation operation, and each edge indicates a weight between two nodes, the weight representing a probability related to action by a second node on input data from a first node;

evaluating each initial candidate data augmentation policy model by:

i. applying each initial candidate data augmentation policy to the sample data to produce an augmented dataset;

ii. at least partially training the deep neural network (DNN) using the augmented sample dataset;

iii. determining a fitness for each initial candidate data augmentation policy model, wherein the candidate data augmentation policy model's fitness is accuracy of the at least partially trained deep neural network (DNN) on a held-out validation dataset from the dataset;

selecting one of (a) a final evaluated data augmentation policy or (b) one or more evaluated initial candidate data augmentation policy models for reproduction on the basis of determined fitness;

upon selection of (b), reproducing child candidate data augmentation policy models from the selected one or more evaluated initial candidate data augmentation policy models; and repeating evaluating and selecting for the child candidate data augmentation policy models until resulting selection is (a), wherein the initial population is separated into subpopulations of candidate data augmentation policy models in accordance with similarity between candidate topologies and each subpopulation is separately subjected to the evaluating, the selecting and the repeating;

applying the final evaluated data augmentation policy on sample data;

training the deep neural network using the sample data on which the final evaluated data augmentation policy has been applied.