US 12,147,397 B2
Method and system for detecting data bucket inconsistencies for A/B experimentation
Niru Appikatala, San Jose, CA (US); Sudhir Chauhan, Sunnyvale, CA (US); Miao Chen, Sunnyvale, CA (US); and Chandrashekhar Shaw, Sunnyvale, CA (US)
Assigned to YAHOO AD TECH LLC, Dulles, VA (US)
Filed by VERIZON MEDIA INC., New York, NY (US)
Filed on Aug. 15, 2017, as Appl. No. 15/677,917.
Prior Publication US 2019/0057118 A1, Feb. 21, 2019
Int. Cl. G06F 16/21 (2019.01); G06F 16/16 (2019.01); G06F 16/215 (2019.01); G06F 16/22 (2019.01); G06F 16/957 (2019.01)
CPC G06F 16/212 (2019.01) [G06F 16/214 (2019.01); G06F 16/215 (2019.01); G06F 16/9577 (2019.01); G06F 16/164 (2019.01); G06F 16/2255 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for identifying data bucket overlap with online experiments, the method being implemented on at least one machine comprising at least one processor, memory, and communications circuitry, and the method comprising:
obtaining first data representing a first set of identifiers associated with a first data bucket of a first online experiment;
obtaining second data representing a second set of identifiers associated with a second data bucket of the first online experiment;
determining, based on the first data and the second data, a first number of identifiers from the first set of identifiers associated with the first data bucket, wherein each of the first number of identifiers includes both a first tag associated with the first data bucket and a second tag associated with the second data bucket, and wherein each of the first number of identifiers is associated with a user device that is assigned with the first and second tags while interacting with the first online experiment at first and second times, respectively;
determining a ratio of the first number of identifiers to a total number of the first set of identifiers associated with the first data bucket;
if the ratio exceeds a threshold, generating a data flag indicating that results associated with the first online experiment are inconsistent; and
eliminating, based on the data flag, an error causing the first number of identifiers to be assigned to both the first data bucket and the second data bucket.