| CPC G06F 21/6254 (2013.01) | 20 Claims |

|
1. A system for preventing sensitive data leakage, using weak learner libraries and a plurality of environments, during label propagation, the system comprising:
one or more processors; and
one or more non-transitory, computer-readable mediums comprising instructions that when executed by the one or more processors causes operations comprising:
receiving a first data set at a first environment, wherein the first data set comprises a plurality of sensitive characteristics, wherein the first data set comprises actual data;
generating a second data set at a second environment, wherein the second data set is a synthetic data set corresponding to the first data set;
determining, based on the second data set at the second environment, a first weak learner for a first labeling task of a plurality of labeling tasks specific to the first data set;
validating, based on the first data set at the first environment, the first weak learner;
in response to validating the first weak learner at the first environment, adding the first weak learner to a first weak learner library for the first data set;
determining, based on the second data set at the second environment, a second weak learner for a second labeling task of the plurality of labeling tasks;
validating, based on the first data set at the first environment, the second weak learner;
adding the second weak learner to the first weak learner library in response to validating the second weak learner;
determining, for the first weak learner library, an aggregate labeling performance for the plurality of labeling tasks specific to the first data set;
comparing the aggregate labeling performance to a threshold aggregate performance; and
determining whether to approve the first weak learner library for use based on comparing the aggregate labeling performance to the threshold aggregate performance.
|