US 12,105,688 B2
Methods and apparatus to estimate audience sizes of media using deduplication based on vector of counts sketch data
Michael Sheppard, Holland, MI (US); Jonathan L. Sullivan, Hurricane, UT (US); Jake Ryan Dailey, San Francisco, CA (US); Damien Forthomme, Seattle, WA (US); Jessica D. Brinson, Chicago, IL (US); Molly Poppie, Arlington Heights, IL (US); Christie Nicole Summers, Baltimore, MD (US); and Diane Morovati Lopez, West Hills, CA (US)
Assigned to The Nielsen Company (US), LLC, New York, NY (US)
Filed by The Nielsen Company (US), LLC, New York, NY (US)
Filed on Jan. 20, 2023, as Appl. No. 18/157,537.
Application 18/157,537 is a continuation of application No. 16/919,974, filed on Jul. 2, 2020, granted, now 11,561,942.
Claims priority of provisional application 62/871,017, filed on Jul. 5, 2019.
Prior Publication US 2023/0161745 A1, May 25, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/215 (2019.01); G06F 16/2455 (2019.01); H04L 67/50 (2022.01); H04N 21/442 (2011.01)
CPC G06F 16/215 (2019.01) [G06F 16/24556 (2019.01); H04L 67/535 (2022.05); H04N 21/44222 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computing system comprising a processor and a memory, the computing system configured to perform a set of acts comprising:
obtaining a first vector of counts indicative of digital impressions logged by a database proprietor, wherein the first vector of counts represents counts within a plurality of bins and is created by: converting impression identifiers associated with the digital impressions logged by the database proprietor to respective hash values using a hash function, and mapping the hash values to respective bins of the plurality of bins;
obtaining a second vector of counts indicative of digital impressions logged by an audience measurement entity, wherein the second vector of counts represents counts within the plurality of bins and is created by: converting impression identifiers associated with the digital impressions logged by the audience measurement entity to respective hash values using a hash function, and mapping the hash values to respective bins of the plurality of bins;
obtaining a first variance of the first vector of counts;
determining a second variance of the second vector of counts;
determining a covariance of the first vector of counts and the second vector of counts;
determining a number of duplications between the digital impressions logged by the database proprietor and the digital impressions logged by the audience measurement entity using the first variance, the second variance, the covariance, a first cardinality of the digital impressions logged by the database proprietor, and a second cardinality of the digital impressions logged by the audience measurement entity;
determining a unique audience for a combination of the digital impressions logged by the database proprietor and the digital impressions logged by the audience measurement entity using the first cardinality, the second cardinality, and the number of duplications; and
transmitting the unique audience to a third-party.