US 12,105,845 B2
System and method for entity resolution of a data element
Michael Busha, Menlo Park, CA (US); Jiachen Mao, San Jose, CA (US); Michael Rinehart, Pleasanton, CA (US); and Rehan Jalil, Portola Valley, CA (US)
Assigned to SECURITI, Inc., Coyote, CA (US)
Filed by SECURITI, Inc., Coyote, CA (US)
Filed on Nov. 6, 2020, as Appl. No. 17/090,943.
Claims priority of provisional application 63/062,809, filed on Aug. 7, 2020.
Prior Publication US 2022/0043934 A1, Feb. 10, 2022
Int. Cl. G06F 21/62 (2013.01); G06F 16/75 (2019.01); G06F 17/18 (2006.01); G06F 18/15 (2023.01); G06F 18/21 (2023.01); G06F 21/60 (2013.01); G06Q 50/00 (2024.01); G06Q 50/16 (2024.01)
CPC G06F 21/6254 (2013.01) [G06F 16/75 (2019.01); G06F 17/18 (2013.01); G06F 18/21 (2023.01); G06F 21/602 (2013.01); G06Q 50/01 (2013.01); G06Q 50/16 (2013.01); G06F 18/15 (2023.01)] 16 Claims
OG exemplary drawing
 
1. A computer implemented system for entity resolution of a data element comprising:
a hardware processor; and
a memory coupled to the hardware processor, wherein the memory comprises a set of program instructions in the form of a processing subsystem, configured to be executed by the hardware processor, wherein the processing subsystem is hosted on a server and configured to execute on a network to control bidirectional communications among a plurality of subsystems comprising:
an entity reference parsing subsystem configured to parse one or more entity references of a corresponding seed set of an entity of the data element into corresponding one or more personal data properties and one or more property values;
a property value standardization subsystem operatively coupled to the entity reference parsing subsystem, wherein the property value standardization subsystem is configured to:
determine one or more standardization operations corresponding to a type of data of the one or more property values based on the one or more entity references parsed; and
perform the one or more standardization operations determined for standardization of the corresponding one or more property values;
a property value anonymization subsystem operatively coupled to the property value standardization subsystem, wherein the property value anonymization subsystem is configured to secure the one or more property values by performing one or more anonymization procedures based on the one or more standardization operations performed;
a property strength quantification subsystem operatively coupled to the property value anonymization subsystem, wherein the property strength quantification subsystem is configured to:
identify at least one additional property suspected to belong to the seed set of the entity based on an observation of the one or more entity references upon anonymization of the one or more property values;
assign a property strength score to the at least one additional property identified by utilizing one or more property strength quantification models based on observation of the one or more property values for the one or more entity references, wherein the one or more property strength quantification models comprises one or more statistical models implemented using a maximum likelihood estimation technique comprising a binary classification process and a probabilistic enhancement technique for assignment of the property strength score based on calculation of probability of a corresponding property value by using a sigmoid function; and
add the at least one additional property to the corresponding seed set of the entity based on the property strength score assigned to the at least one additional property;
a local entity resolution subsystem operatively coupled to the property standardization subsystem, wherein the local entity resolution subsystem is configured toper form a first entity resolution process based on comparison of an entity reference among the one or more entity references with each of the entity respectively from the corresponding seed set of the entity at a predetermined time interval,
wherein the first entity resolution process comprises at least one of a pre-defined heuristic example between the entity and the entity reference, an evidence accumulation between the entity and the entity reference using a one-degree Comparison function or a n-degree Comparison function, a standardized and anonymized comparison function, a property strength integration or a combination thereof,
wherein the evidence accumulation calculates a probability between the entity and the entity reference by considering that the entity reference belongs to the entity by using convolution of a one-degree Comparison function or a n-degree Comparison function of individual properties, wherein probability calculation comprises a property-specific function and a function to combine a set of individual comparison probabilities,
wherein the property specific function is configured to determine how likely the entity reference is to resolve a specific entity considering that the property is in isolation and the function comprises a Pearson method or a fisher method for combining P-values of the Bayes approach to combining probabilities;
a global entity resolution subsystem operatively coupled to the local entity resolution subsystem, wherein the global entity resolution subsystem is configured to perform a second entity resolution process based on comparison of the one or more property values for the one or more entity references with one or more property values for the one or more entities; and
an entity creation and updating subsystem operatively coupled to the global entity resolution subsystem, wherein the entity creation and updating subsystem is configured to modify the corresponding seed set of the entity or the one or more entity references for entity resolution based on the property strength score assigned to the at least one additional property in the one or more entity references.