US 12,406,093 B2
	Systems and methods for preventing sensitive data leakage during label propagation
Jeremy Goodsitt, Champaign, IL (US); Michael Davis, Arlington, VA (US); Taylor Turner, Richmond, VA (US); Kenny Bean, Herndon, VA (US); and Tyler Farnan, San Diego, CA (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Aug. 21, 2023, as Appl. No. 18/452,630.
Prior Publication US 2025/0068768 A1, Feb. 27, 2025
Int. Cl. G06F 21/00 (2013.01); G06F 21/62 (2013.01)

CPC G06F 21/6254 (2013.01)

20 Claims

1. A system for preventing sensitive data leakage, using weak learner libraries and a plurality of environments, during label propagation, the system comprising:

one or more processors; and

one or more non-transitory, computer-readable mediums comprising instructions that when executed by the one or more processors causes operations comprising:

receiving a first data set at a first environment, wherein the first data set comprises a plurality of sensitive characteristics, wherein the first data set comprises actual data;

generating a second data set at a second environment, wherein the second data set is a synthetic data set corresponding to the first data set;

determining, based on the second data set at the second environment, a first weak learner for a first labeling task of a plurality of labeling tasks specific to the first data set;

validating, based on the first data set at the first environment, the first weak learner;

in response to validating the first weak learner at the first environment, adding the first weak learner to a first weak learner library for the first data set;

determining, based on the second data set at the second environment, a second weak learner for a second labeling task of the plurality of labeling tasks;

validating, based on the first data set at the first environment, the second weak learner;

adding the second weak learner to the first weak learner library in response to validating the second weak learner;

determining, for the first weak learner library, an aggregate labeling performance for the plurality of labeling tasks specific to the first data set;

comparing the aggregate labeling performance to a threshold aggregate performance; and

determining whether to approve the first weak learner library for use based on comparing the aggregate labeling performance to the threshold aggregate performance.