US 12,450,227 B2
Real time system for ingestion, aggregation, and identity association of data from user actions performed on websites or applications
Cynthia Rogers, Palo Alto, CA (US); William Pentney, San Francisco, CA (US); Eric Pollmann, Los Altos, CA (US); and Muhammad Bilal Mahmood, San Francisco, CA (US)
Assigned to Amplitude, Inc., San Francisco, CA (US)
Filed by AMPLITUDE, INC., San Francisco, CA (US)
Filed on Oct. 6, 2023, as Appl. No. 18/482,774.
Application 18/482,774 is a continuation of application No. 16/740,302, filed on Jan. 10, 2020, granted, now 11,803,536.
Prior Publication US 2024/0104088 A1, Mar. 28, 2024
Int. Cl. G06F 16/20 (2019.01); G06F 16/23 (2019.01); G06F 16/25 (2019.01)
CPC G06F 16/2379 (2019.01) [G06F 16/254 (2019.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
storing a plurality of event records corresponding to a respective plurality of user interactions with a graphical user interface for a website or an application in a persistent storage, the storing comprising:
receiving the plurality of event records;
grouping the plurality of event records and compressing the plurality of event records for storage in a compressed format as a compressed set of event records, wherein the compressed set of event records is stored in association with a file path; and
writing the file path of the compressed set of event records to a publish-subscribe queue;
ingesting, from the publish-subscribe queue, a stream of data comprising the compressed set of event records, wherein the compressed set of event records is accessed using the file path as retrieved from a publish/subscribe message, wherein the ingesting decompresses the compressed set of event records back into the plurality of event records and is performed in parallel to the storing;
determining that a first record from the plurality of event records includes a first anonymous identifier and a first known identifier;
adding a mapping between the first anonymous identifier and the first known identifier to an identifier resolution database;
determining that a second record from the plurality of event records includes a second anonymous identifier and no known identifier;
using the identifier resolution database to identify a second known identifier that is mapped to the second anonymous identifier;
updating the second record to include the second known identifier; and
including the first record and the updated second record in a training matrix for training a machine learning model to compute causal inferences associated with the plurality of user interactions.