US 12,032,535 B2
Methods and apparatus to estimate audience sizes of media using deduplication based on multiple vectors of counts
Michael R. Sheppard, Holland, MI (US); Jake Ryan Dailey, San Francisco, CA (US); Damien Forthomme, Seattle, WA (US); Jonathan Sullivan, Hurricane, UT (US); Jessica Brinson, Chicago, IL (US); Christie Nicole Summers, Baltimore, MD (US); Diane Morovati Lopez, West Hills, CA (US); and Molly Poppie, Arlington Heights, IL (US)
Assigned to The Nielsen Company (US), LLC, New York, NY (US)
Filed by The Nielsen Company (US), LLC, New York, NY (US)
Filed on Jun. 30, 2020, as Appl. No. 16/917,459.
Prior Publication US 2021/0406232 A1, Dec. 30, 2021
Int. Cl. G06F 16/215 (2019.01); G06Q 30/0204 (2023.01); H04N 21/466 (2011.01); G06Q 50/26 (2012.01)
CPC G06F 16/215 (2019.01) [G06Q 30/0205 (2013.01); H04N 21/4667 (2013.01); G06Q 50/265 (2013.01)] 28 Claims
OG exemplary drawing
 
1. An apparatus to determine an audience size for media based on sketch data, the apparatus comprising:
memory;
programmable circuitry; and
instructions to cause the programmable circuitry to:
obtain (a) first sketch data included in a first network communication from a first server of a first database proprietor and (b) second sketch data included in a second network communication from a second server of a second database proprietor, the first sketch data to represent a first audience of a media item including first subscribers of the first database proprietor, the second sketch data to represent a second audience of the media item including second subscribers of the second database proprietor, the first sketch data including a first plurality of vectors of counts, each vector of counts of the first plurality of vectors of counts representing the first audience as a distribution of first hash values, the first hash values corresponding to the first subscribers and generated using a hash algorithm, each vector of counts of the first plurality of vectors of counts including a plurality of bins having respective bin numbers, the first hash values usable to determine bin numbers for assigning the first subscribers to respective bins of the plurality of bins, the second sketch data including a second plurality of vectors of counts, each vector of counts of the second plurality of vectors of counts representing the second audience as a distribution of second hash values, the second hash values corresponding to the second subscribers and generated using the hash algorithm, each vector of counts of the second plurality of vectors of counts including the plurality of bins, the second hash values usable to determine bin numbers for assigning the second subscribers to respective bins of the plurality of bins, the first and second sketch data to preserve privacy of the first and second subscribers by representing the first and second subscribers via the first and second pluralities of vectors of counts instead of sharing identifying information associated with the first and second subscribers, wherein the first audience is known to the first database proprietor but not to the second database proprietor, and wherein the second audience is known to the second database proprietor but not to the first database proprietor;
determine coefficient values for a polynomial based on (a) normalized weighted sums of variances, (b) a normalized weighted sum of covariances, and (c) cardinalities, the normalized weighted sums of variances, the normalized weighted sum of covariances, and the cardinalities corresponding to the first plurality of vectors of counts and the second plurality of vectors of counts;
determine a real root value of the polynomial, the real root value indicative of a number of audience members of the first audience represented in the first plurality of vectors of counts that are also represented in the second plurality of vectors of counts;
determine the audience size based on the real root value and the cardinalities of the first plurality of vectors of counts and the second plurality of vectors of counts; and
cause transmission of the audience size to a customer computer.