US 11,727,147 B2
Systems and methods for anonymizing large scale datasets
Alessandro Epasto, New York, NY (US); Hossein Esfandiari, Jersey City, NJ (US); Vahab Seyed Mirrokni, Hoboken, NJ (US); Andres Munoz Medina, Mountain View, CA (US); Umar Syed, Rahway, NJ (US); and Sergei Vassilvitskii, New Jersey, NJ (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 10, 2020, as Appl. No. 17/16,788.
Prior Publication US 2022/0075897 A1, Mar. 10, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 21/62 (2013.01); G06N 20/00 (2019.01); G06F 16/28 (2019.01)
CPC G06F 21/6254 (2013.01) [G06F 16/285 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset, the computer-implemented method comprising:
obtaining, by a computing system comprising one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities;
clustering, by the computing system, the plurality of entities into at least one entity cluster by mapping the plurality of entities and the at least one data item to a plurality of points in a dimensional space, wherein the at least one of the plurality of entities is represented as a point from among the plurality of points in the dimensional space with a value related to the at least one data item;
determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities; and
assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.