US 11,657,292 B1
Systems and methods for machine learning dataset generation
Paul Nicotera, Eden Prairie, MN (US); and Mandeep Singh, Eden Prairie, MN (US)
Assigned to ARCHITECTURE TECHNOLOGY CORPORATION, Eden Prairie, MN (US)
Filed by ARCHITECTURE TECHNOLOGY CORPORATION, Eden Prairie, MN (US)
Filed on Jan. 15, 2020, as Appl. No. 16/743,977.
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01); G06N 3/088 (2023.01)
CPC G06N 3/088 (2013.01) [G06N 3/0454 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, by a server, a set of seed examples in a domain comprising a limited number of datasets of a same data type;
generating, by the server, candidate labeled datasets according to features of the set of seed examples;
training, by the server, a dataset generator in the domain by iteratively:
executing, by the server, a label discriminator that identifies and rejects mislabeled datasets included in the candidate labeled datasets;
executing, by the server, a domain discriminator that identifies and rejects datasets that are out of the domain from the candidate labeled datasets;
regenerating, by the server, new candidate labeled datasets based on results of the label discriminator and the domain discriminator,
wherein the server iteratively executes the label discriminator and the domain discriminator and regenerates the new candidate labeled datasets until attaining a pass rate of each of the label discriminator and the domain discriminator satisfying a threshold; and
storing, by the server, the trained dataset generator and the corresponding domain into a database.