CPC G06F 18/2148 (2023.01) [G06F 3/0617 (2013.01); G06F 3/0644 (2013.01); G06F 3/065 (2013.01); G06F 3/0652 (2013.01); G06F 3/0653 (2013.01); G06F 3/0685 (2013.01); G06F 9/5016 (2013.01); G06N 3/049 (2013.01)] | 20 Claims |
1. A system for synthetic data generation, comprising:
at least one processor; and
at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising:
determining a size of stored streaming data, stored in a first storage device, has reached a first threshold;
in response to the size determination, processing the stored streaming data, the processing comprising:
determining a total number of samples in the stored streaming data;
creating a plurality of bins, the bins having data ranges between a minimum and a maximum sample value;
assigning the samples to the bins, based on values of the samples and data ranges of the bins; and
determining a number of samples within the bins;
populating the bins with synthetic data, the populating comprising:
generating, by a synthetic data generator, a plurality of synthetic data points; and
assigning the synthetic data points to the bins based on values of the synthetic data points and data ranges of the bins;
determining a total number of the synthetic data points in the bins has reached a second threshold;
in response to the determination of whether the total number has reached a second threshold, pausing the synthetic data generator;
creating a processed dataset based on the bins; and
storing the processed dataset on a second storage device.
|