US 12,072,998 B2
Differentially private processing and database storage
Ishaan Nerurkar, Berkeley, CA (US); Christopher Hockenbrocht, Berkeley, CA (US); Liam Damewood, Walnut Creek, CA (US); Mihai Maruseac, Berkeley, CA (US); and Alexander Rozenshteyn, Berkeley, CA (US)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Jul. 29, 2021, as Appl. No. 17/389,100.
Application 17/389,100 is a continuation of application No. 16/810,708, filed on Mar. 5, 2020, granted, now 11,100,247.
Application 16/810,708 is a continuation of application No. 16/238,439, filed on Jan. 2, 2019, granted, now 10,733,320, issued on Aug. 4, 2020.
Application 16/238,439 is a continuation of application No. 15/793,907, filed on Oct. 25, 2017, granted, now 10,229,287, issued on Mar. 12, 2019.
Application 15/793,907 is a continuation of application No. 15/203,797, filed on Jul. 7, 2016, granted, now 10,192,069, issued on Jan. 29, 2019.
Claims priority of provisional application 62/249,938, filed on Nov. 2, 2015.
Prior Publication US 2021/0357523 A1, Nov. 18, 2021
Int. Cl. G06F 21/62 (2013.01); G06F 16/2453 (2019.01); G06F 16/2455 (2019.01); G06F 16/2458 (2019.01); G06F 16/248 (2019.01); G06F 16/25 (2019.01); G06N 5/01 (2023.01); G06N 20/00 (2019.01); G06N 20/20 (2019.01); H04L 9/40 (2022.01)
CPC G06F 21/6227 (2013.01) [G06F 16/24547 (2019.01); G06F 16/2455 (2019.01); G06F 16/2462 (2019.01); G06F 16/2465 (2019.01); G06F 16/248 (2019.01); G06F 16/25 (2019.01); G06F 21/6218 (2013.01); G06F 21/6245 (2013.01); G06F 21/6254 (2013.01); G06N 5/01 (2023.01); G06N 20/00 (2019.01); G06N 20/20 (2019.01); H04L 63/105 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A database system configured to implement differential privacy, comprising:
a processor configured to execute computer program instructions; and
a non-transitory computer-readable storage medium storing computer program instructions executable by the processor to perform actions comprising:
receiving a database query requesting a differentially private response to the query;
determining a privacy parameter ε associated with the query, wherein ε describes a degree of information to release about a set of data stored by the database system that is responsive to the query;
identifying a privacy budget associated with the query, the privacy budget specified in terms of ε and representing a degree of information available to be released about data by the database system;
decrementing the privacy budget by a first ε spend determined responsive to the query; and
applying the query to the database system by performing a differentially private set of operations on the set of data stored by the database system that is responsive to the query to produce a differentially private result set that releases the degree of information about the set of data described by the privacy parameter ε, wherein a second subset of the set of data is labeled with a category chosen from a set of two or more categories, wherein the differentially private set of operations comprises:
performing a count operation on a first subset of the set of data;
perturbing results of the count operation on the first subset by a factor defined by a Gaussian random variable G( ) to produce the differentially private result set;
receiving a trained classifier and generating an output vector by applying the classifier to entries of the second subset, each element of the output vector corresponding to a numerical output of the classifier for a corresponding entry in the second subset;
identifying a threshold value and assigning categories for each of the elements of the output vector based on a perturbed threshold value; and
recording counts related to a performance of the classifier, the counts generated by comparing the assigned categories of the elements of the output vector to the corresponding label in the second subset, and the differentially private set of operations comprises:
perturbing the threshold value based on a second ε spend ε2 to generate the perturbed threshold value; and
perturbing the counts relating to the performance of the classifier based on ε2 to produce the perturbed counts as the differentially private result set.