US 12,386,797 B2
Multi-service business platform system having entity resolution systems and methods
Hector Urdiales, Dublin (IE); Marco Lagi, Medford, MA (US); Stuart P. Layton, Waltham, MA (US); Bryan Ash, Arlington, MA (US); Sophie Higgs, Cambridge, MA (US); Robert McEneaney, Concord, MA (US); Dylan Sellberg, Swampscott, MA (US); Anna Coffey, Wenham, MA (US); Jared Williams, Somerville, MA (US); and Stephen J. Purcell, Dublin (IE)
Assigned to HUBSPOT, INC., Cambridge, MA (US)
Filed by HUBSPOT, INC., Cambridge, MA (US)
Filed on Sep. 8, 2023, as Appl. No. 18/244,042.
Application 18/244,042 is a continuation in part of application No. 17/318,737, filed on May 12, 2021, granted, now 11,775,494.
Claims priority of provisional application 63/080,900, filed on Sep. 21, 2020.
Claims priority of provisional application 63/023,406, filed on May 12, 2020.
Prior Publication US 2023/0418793 A1, Dec. 28, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/215 (2019.01)
CPC G06F 16/215 (2019.01) 20 Claims
OG exemplary drawing
 
1. A method, comprising:
selecting a quantity and type of neural networks to use for deduplicating objects based upon computing resources and training time;
processing, by a neural network having a type selected for deduplicating the objects, a set of entity feature encodings of entities for deduplication to generate a first reduced vector and a second reduced vector, wherein the neural network is implemented as a vector dimension reducing neural network to produce a reduced dimension entity-specific vector for each entity in a pair that represents feature vectors for a corresponding entity in the pair;
generating representations of likelihoods that entity pairs of the entities are duplicates based upon a dot product of the first reduced vector and the second reduced vector;
training a machine learning model using the likelihoods that entity pairs of the entities are duplicates and p-merge values corresponding to a pair of entities for which a current reduced dimension entity-specific vector is generated by the neural network; and
utilizing the machine learning model to deduplicate the objects within a database, wherein a front-end text encoder, a middle stage trained neural network, and a back-end merge indicator function are selected and used to process the entity pairs for deduplicating the objects, wherein the middle stage trained neural network is replicated to scale the machine learning model for concurrently handling multiple pairs of entities.