| CPC G06F 16/285 (2019.01) [G06F 21/6254 (2013.01)] | 20 Claims |

|
1. A method comprising:
accessing a database storing a dataset comprising a set of rows each corresponding to a record and a set of columns each corresponding to an attribute;
applying a machine learning model to the dataset, the machine learning model configured to classify each record in the database and produce a measure of feature importance for each attribute in classifying each record;
generating a modified database using two attributes determined to be most highly ranked based on the produced feature importances; and
generating a final database within a non-transitory computer-readable storage medium by iteratively applying the machine learning model to the modified database to produce a set of records and modifying the modified database to include a next-highest ranked attribute until consecutive sets of records have an above-threshold measure of similarity, wherein the attributes included within the modified database before a most-recently included attribute comprise quasi-identifiers.
|