| CPC G06Q 30/0631 (2013.01) [G06F 16/285 (2019.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] | 20 Claims |

|
1. A system, comprising:
one or more memory devices storing instructions; and
one or more processors configured to execute the instructions to perform operations comprising:
categorizing historical consumer data based on a set of characteristics comprising at least one of a transactional volume, a click-through rate, or whether the historical consumer data falls below a sparsity limit;
receiving a first request to generate a first synthetic dataset as training data to a machine learning system, the first request specifying a first requirement for at least one of the characteristics;
retrieving, from the historical consumer data, a first subset of the historical consumer data satisfying the first requirement;
providing the first subset of the historical consumer data as input to a data model, the data model mapping from a random or pseudorandom vector to elements in a training data space, to generate the first synthetic dataset for the machine learning system, wherein generating the first synthetic dataset comprises determining a vector connecting a first representative point and a second representative point in code space; and
providing the first synthetic dataset as training data to the machine learning system.
|