US 12,386,875 B2
Massive scale heterogeneous data ingestion and user resolution
Anukool Rege, Basking Ridge, NJ (US); Prashant Kumar Sahay, West Windsor, NJ (US); Mervyn Lally, Laguna Niguel, CA (US); Shirish Kumar, Saratoga, CA (US); and Sanskar Sahay, Princeton, NJ (US)
Assigned to Experian Information Solutions, Inc., Costa Mesa, CA (US)
Filed by Experian Information Solutions, Inc., Costa Mesa, CA (US)
Filed on May 2, 2023, as Appl. No. 18/310,989.
Application 18/310,989 is a continuation of application No. 17/457,757, filed on Dec. 6, 2021, granted, now 11,681,733.
Application 17/457,757 is a continuation of application No. 15/885,239, filed on Jan. 31, 2018, granted, now 11,227,001, issued on Jan. 18, 2022.
Claims priority of provisional application 62/452,701, filed on Jan. 31, 2017.
Prior Publication US 2024/0061873 A1, Feb. 22, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/335 (2019.01); G06F 16/2457 (2019.01); G06F 16/901 (2019.01); G06F 16/9535 (2019.01); G06Q 40/03 (2023.01)
CPC G06F 16/337 (2019.01) [G06F 16/24573 (2019.01); G06F 16/9014 (2019.01); G06Q 40/03 (2023.01); G06F 16/9535 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method to determine account holder identities for collected event information, the computer-implemented method comprising:
as implemented by one or more computing devices comprising one or more hardware processors and configured with specific computer-executable instructions,
receiving, from a plurality of data sources, a plurality of event information that comprises heterogeneous data structures;
for each event information:
accessing a data store including associations between data sources and identifier parameters, the identifier parameters including at least an indication of one or more identifiers included in event information from the corresponding data source;
determining, based at least on the identifier parameters of the data source of the event information, identifiers included in the event information as indicated in the accessed data store; and
extracting identifiers from the event information based at least on the corresponding identifier parameters, wherein a combination of the identifiers comprise a unique identity associated with a unique user;
accessing a plurality of hash functions, each associated with a combination of identifiers;
for each unique identity, calculating a plurality of hashes by evaluating the plurality of hash functions;
based on whether unique identities share a common hash calculated with a common hash function, selectively grouping unique identities into sets of unique identities associated with common hashes;
for each set of unique identities:
applying one or more match rules including criteria for comparing unique identities within the set; and
determining a matching set of unique identities as those meeting one or more of the match rules;
merging matching sets of unique identities each including at least one common unique identity to provide one or more merged sets comprising no unique identity in common with other merged sets by repeating until the matching sets are merged, a process of creating pairs of records from each matching set, reversing each pair, and grouping by leftmost record where the leftmost record is common between the pairs, each merged set associated with one user;
for each merged set:
determining an inverted personal identifier; and
associating the inverted personal identifier to each of the unique identities in the merged set to create an inverted personal identifier map;
for each unique identity, using the inverted personal identifier map to:
identify event information associated with at least one of the combinations of identifiers associated with the unique identity; and
associate the inverted personal identifier with the identified event information, wherein each inverted personal identifier is associated with multiple unique identities in the merged set associated with the unique user and wherein the identified event information is associated with multiple events that are associated with the unique user.