US 11,854,030 B2
Methods and apparatus to estimate cardinality across multiple datasets represented using bloom filter arrays
Michael Sheppard, Holland, MI (US); Jonathan Sullivan, Hurricane, UT (US); Diane Morovati Lopez, West Hills, CA (US); Jake Ryan Dailey, San Francisco, CA (US); Christie Nicole Summers, Baltimore, MD (US); and Molly Poppie, Arlington Heights, IL (US)
Assigned to The Nielsen Company (US), LLC, New York, NY (US)
Filed by The Nielsen Company (US), LLC, New York, NY (US)
Filed on Jun. 29, 2021, as Appl. No. 17/362,404.
Prior Publication US 2023/0004997 A1, Jan. 5, 2023
Int. Cl. G06Q 10/00 (2023.01); G06Q 30/0204 (2023.01); G06F 16/22 (2019.01); G06Q 30/0201 (2023.01)
CPC G06Q 30/0204 (2013.01) [G06F 16/22 (2019.01); G06Q 30/0201 (2013.01)] 30 Claims
OG exemplary drawing
 
1. An apparatus comprising:
communications interface circuitry to:
transmit, via first network communications, bloom filter parameters to a first database proprietor, a second database proprietor, and a third database proprietor;
receive, via second network communications, first, second, and third Bloom filter arrays generated by respective first, second, and third servers of respective ones of the first, second, and third database proprietors, each of the first, second, and third Bloom filter arrays having a length, the length defined by the bloom filter parameters, different ones of the first, second, and third Bloom filter arrays representative of different sets of users registered with respective ones of the first, second, and third database proprietors and that accessed media, the first, second, and third Bloom filter arrays including differential privacy noise, the first, second, and third Bloom filter arrays generated to maintain a privacy of the different sets of users such that duplication of the users across the different sets of users cannot be directly determined from the Bloom filter array to determine of an audience size across the different sets of users;
at least one memory;
instructions; and
programmable circuitry to execute and/or instantiate the instructions to:
determine, by executing a numerical solver, the length defined by the Bloom filter parameters provided to the first, second, and third database proprietors, the length determined to correspond to a minimum length to provide a first relative error in an estimate of the audience size no greater than a second relative error at a confidence level;
determine an inclusion-exclusion expression that defines the audience size for a user group of interest, terms in the inclusion-exclusion expression corresponding to either a first cardinality of the first Bloom filter array or a second cardinality of a union of two or more of the first, second, and third Bloom filter arrays;
estimate, based on the inclusion-exclusion expression, the audience size of the user group of interest that accounts for the duplication of the users across the different sets of users, the length determined to reduce at least one of memory requirements to store data associated with processing of the first, second, and third Bloom filter arrays or processing requirements to implement the estimation relative to a longer length for the first, second, and third Bloom filter arrays; and
cause the communications interface circuitry to transmit, via a third network communication, a report based on the estimate of the audience size to a third-party entity.