US 12,217,295 B2
Method, medium, and system for generating synthetic data
Anh Truong, Champaign, IL (US); Austin Walters, Savoy, IL (US); and Jeremy Goodsitt, Champaign, IL (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Jul. 21, 2020, as Appl. No. 16/935,013.
Application 16/935,013 is a continuation of application No. 16/514,000, filed on Jul. 17, 2019, granted, now 10,755,338.
Prior Publication US 2021/0019804 A1, Jan. 21, 2021
Int. Cl. G06Q 30/00 (2023.01); G06F 16/28 (2019.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06Q 30/0601 (2023.01)
CPC G06Q 30/0631 (2013.01) [G06F 16/285 (2019.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
one or more memory devices storing instructions; and
one or more processors configured to execute the instructions to perform operations comprising:
categorizing historical consumer data based on a set of characteristics comprising at least one of a transactional volume, a click-through rate, or whether the historical consumer data falls below a sparsity limit;
receiving a first request to generate a first synthetic dataset as training data to a machine learning system, the first request specifying a first requirement for at least one of the characteristics;
retrieving, from the historical consumer data, a first subset of the historical consumer data satisfying the first requirement;
providing the first subset of the historical consumer data as input to a data model, the data model mapping from a random or pseudorandom vector to elements in a training data space, to generate the first synthetic dataset for the machine learning system, wherein generating the first synthetic dataset comprises determining a vector connecting a first representative point and a second representative point in code space; and
providing the first synthetic dataset as training data to the machine learning system.