| CPC G06N 3/0475 (2023.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01)] | 17 Claims |

|
1. A method implemented by a computer for generating synthetic data by a data collection system, the method comprising:
collecting data including sensitive data and non-sensitive data associated with a network and subscribers of the network;
organizing the data in a tabular form in time series into subsets and identifying subsets containing the sensitive data;
executing a first generative adversarial network (GAN) to generate the synthetic data from the organized data where the synthetic data has characteristics similar to the collected data and does not exceed a first threshold, wherein the first threshold determines a maximum percentage of data that is classified as synthetic data;
executing a second GAN to update the synthetic data so that a discriminator in the second GAN does not predict the sensitive data based on a second threshold, in order to anonymize the sensitive data from being recovered from updated synthetic data, wherein the second threshold determines a percentage of sensitive data that can be retrieved from the updated synthetic data;
checking whether the updated synthetic data meets the first threshold;
releasing the updated synthetic data where the first threshold is met; and
re-executing the first GAN and the second GAN to further update the updated synthetic data where the first threshold is not met during the checking.
|