US 12,189,586 B2
Deduplication across multiple different data sources to identify common devices
Rachel Worth Olson, Elk Grove Village, IL (US); Michael Evan Anderson, Chicago, IL (US); Rishi Sriram, Naperville, IL (US); Margaret M. Orton, Chicago, IL (US); Fatemehossadat Miri, Chicago, IL (US); Samantha M. Mowrer, San Francisco, CA (US); David J. Kurzynski, South Elgin, IL (US); and Molly Poppie, Arlington Heights, IL (US)
Assigned to The Nielsen Company (US), LLC, New York, NY (US)
Filed by The Nielsen Company (US), LLC, New York, NY (US)
Filed on Aug. 29, 2022, as Appl. No. 17/898,058.
Application 17/898,058 is a continuation of application No. 16/925,961, filed on Jul. 10, 2020, granted, now 11,429,575.
Claims priority of provisional application 62/873,699, filed on Jul. 12, 2019.
Prior Publication US 2023/0004540 A1, Jan. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/215 (2019.01); H04N 21/442 (2011.01)
CPC G06F 16/215 (2019.01) [H04N 21/44222 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for processing multiple data sets to generate deduplicated audience measurement data, the system comprising:
a processor; and
at least one memory storing instructions that, when executed by the processor, cause the system to perform operations comprising:
receiving, via first network communications with one or more computing devices, a first set of data obtained by meter devices having a first meter device type,
receiving, via second network communications with the one or more computing devices, a second set of data obtained by meter devices having a second meter device type, wherein the first meter device type is different from the second meter device type;
processing the first set of data and the second set of data to identify a first media presentation device represented by the first set of data and a second media presentation device represented by the second set of data as a possible common media presentation device;
calculating at least one of a station duration metric, a time match metric or a station path metric, wherein:
i) the station duration metric is based on a first set of durations of time that the first media presentation device tuned to a first set of stations and a second set of durations of time that the second media presentation device tuned to the first set of stations,
ii) the time match metric is based on a first set of times of day that the first media presentation device tuned to a second set of stations and a second set of times of day that the second media presentation device tuned to the second set of stations, and
iii) the station path metric based on a first sequence of stations tuned to by the first media presentation device and a second sequence of stations tuned to by the second media presentation device;
determining a score based on the at least one of the station duration metric, the time match metric, or the station path metric;
determining that the first media presentation device and the second media presentation device are a common media presentation device based on the score;
processing the first set of data by removing data corresponding to the common media presentation device to generate a final data set; and
storing, in the at least one memory, the final data set as deduplicated audience measurement data.