US 12,468,739 B2
Detection of susceptibility of membership inference attacks on synthetic data
André Castro, Lisbon (PT); David Clyde Williamson, Great Missenden (GB); Vichai Levy, Norwalk, CT (US); and Chandan Chaitanya, Telangana (IN)
Assigned to Protegrity US Holding, LLC, Stamford, CT (US)
Filed by Protegrity US Holding, LLC, Stamford, CT (US)
Filed on Oct. 11, 2024, as Appl. No. 18/912,985.
Claims priority of provisional application 63/593,536, filed on Oct. 27, 2023.
Claims priority of provisional application 63/593,535, filed on Oct. 27, 2023.
Prior Publication US 2025/0139132 A1, May 1, 2025
Int. Cl. G06F 16/00 (2019.01); G06F 16/28 (2019.01); G06F 21/62 (2013.01)
CPC G06F 16/285 (2019.01) [G06F 21/6254 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
accessing a database comprising a set of rows each corresponding to a record and a set of columns each corresponding to an attribute;
splitting the accessed database into a first training database and a first holdout database;
applying a synthetic data engine to the first training database to generate a synthetic database, each synthetic record in the synthetic database comprising fabricated data produced based on one or more records of the first training database;
applying a machine learning model to the synthetic database to produce a measure of confidence that each synthetic record in the synthetic database is a record in the accessed database, the machine learning model configured to classify an input record as one or more of the records in the accessed database;
generating an intermediary database comprising:
records of the accessed database,
attributes within the accessed database determined to be quasi-identifier attributes,
synthetic attributes corresponding to a threshold number of synthetic records associated with the greatest measures of confidence, and
a column indicating whether each record is included in the first training database;
splitting the intermediary database into a second training database and a second holdout database;
training a machine learning binary classifier using the second training database, the machine learning binary classifier configured to classify an input record as present or absent within the first training database;
applying the machine learning binary classifier to the second holdout database to predict which records within the second holdout database are within the first training database; and
in response to the machine learning binary classifier successfully identifying which records within the second holdout database are within the first training database, flagging the accessed database as susceptible to a membership inference attack.