| CPC G06F 21/6227 (2013.01) [G06F 16/2456 (2019.01)] | 20 Claims |

|
1. A system comprising:
one or more computers of an entity configured to implement a privacy-preserving dataset joining system to:
receive a query that specifies a join operation to be performed on contents of at least a first private dataset of the entity and a second private dataset of a different entity;
in response to receiving the query:
produce a first sketch of the first private dataset according to a plurality of shared parameters, wherein the first private dataset comprises a plurality of identities, wherein individual identities of the plurality of identities of the first private dataset correspond with at least one value of a first set of values, and wherein the first sketch at least comprises, for a first value of the first set of values, a mapping, to entries of a first instance of a data structure using a hash function, of a first set of identities of the plurality of identities that correspond to the first value;
obtain a privacy-preserving second sketch of the second private dataset that was produced according to the plurality of shared parameters, wherein the second private dataset comprises at least some identities of the plurality of identities, wherein individual identities of the at least some identities of the second private dataset correspond with at least one value of a second set of values, wherein the privacy-preserving second sketch at least comprises, for a second value of the second set of values, a mapping, to entries of a second instance of the data structure using the hash function, of a second set of identities of the at least some identities that correspond to the second value, and wherein the privacy-preserving second sketch further comprises added noise;
join the first sketch and the privacy-preserving second sketch to produce a joined dataset;
determine an estimate of a number of identities that correspond to both the first value of the first set of values and the second value of the second set of values from the joined dataset; and
respond to the query with the estimate of the number of identities that correspond to both the first value and the second value.
|