US 11,934,486 B2
Systems and methods for data stream using synthetic data
Anh Truong, Champaign, IL (US); Jeremy Goodsitt, Champaign, IL (US); and Austin Walters, Savoy, IL (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Nov. 17, 2020, as Appl. No. 16/950,399.
Application 16/950,399 is a continuation of application No. 16/596,886, filed on Oct. 9, 2019, granted, now 10,860,892.
Prior Publication US 2021/0133504 A1, May 6, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 18/214 (2023.01); G06F 3/06 (2006.01); G06F 9/50 (2006.01); G06N 3/049 (2023.01)
CPC G06F 18/2148 (2023.01) [G06F 3/0617 (2013.01); G06F 3/0644 (2013.01); G06F 3/065 (2013.01); G06F 3/0652 (2013.01); G06F 3/0653 (2013.01); G06F 3/0685 (2013.01); G06F 9/5016 (2013.01); G06N 3/049 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for synthetic data generation, comprising:
at least one processor; and
at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising:
determining a size of stored streaming data, stored in a first storage device, has reached a first threshold;
in response to the size determination, processing the stored streaming data, the processing comprising:
determining a total number of samples in the stored streaming data;
creating a plurality of bins, the bins having data ranges between a minimum and a maximum sample value;
assigning the samples to the bins, based on values of the samples and data ranges of the bins; and
determining a number of samples within the bins;
populating the bins with synthetic data, the populating comprising:
generating, by a synthetic data generator, a plurality of synthetic data points; and
assigning the synthetic data points to the bins based on values of the synthetic data points and data ranges of the bins;
determining a total number of the synthetic data points in the bins has reached a second threshold;
in response to the determination of whether the total number has reached a second threshold, pausing the synthetic data generator;
creating a processed dataset based on the bins; and
storing the processed dataset on a second storage device.