US 12,455,904 B2
Automatic quasi-identifier detection and recommendations
André Castro, Lisbon (PT); David Clyde Williamson, Great Missenden (GB); Vichai Levy, Norwalk, CT (US); and Chandan Chaitanya, Telangana (IN)
Assigned to Protegrity US Holding, LLC, Stamford, CT (US)
Filed by Protegrity US Holding, LLC, Stamford, CT (US)
Filed on Oct. 11, 2024, as Appl. No. 18/912,980.
Claims priority of provisional application 63/593,535, filed on Oct. 27, 2023.
Claims priority of provisional application 63/593,536, filed on Oct. 27, 2023.
Prior Publication US 2025/0139131 A1, May 1, 2025
Int. Cl. G06F 7/00 (2006.01); G06F 16/28 (2019.01); G06F 21/62 (2013.01)
CPC G06F 16/285 (2019.01) [G06F 21/6254 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
accessing a database storing a dataset comprising a set of rows each corresponding to a record and a set of columns each corresponding to an attribute;
applying a machine learning model to the dataset, the machine learning model configured to classify each record in the database and produce a measure of feature importance for each attribute in classifying each record;
generating a modified database using two attributes determined to be most highly ranked based on the produced feature importances; and
generating a final database within a non-transitory computer-readable storage medium by iteratively applying the machine learning model to the modified database to produce a set of records and modifying the modified database to include a next-highest ranked attribute until consecutive sets of records have an above-threshold measure of similarity, wherein the attributes included within the modified database before a most-recently included attribute comprise quasi-identifiers.