CPC G06F 21/6254 (2013.01) [G06F 16/285 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset, the computer-implemented method comprising:
obtaining, by a computing system comprising one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities;
clustering, by the computing system, the plurality of entities into at least one entity cluster by mapping the plurality of entities and the at least one data item to a plurality of points in a dimensional space, wherein the at least one of the plurality of entities is represented as a point from among the plurality of points in the dimensional space with a value related to the at least one data item;
determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities; and
assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.
|