| CPC G06F 16/215 (2019.01) | 20 Claims |

|
1. A method, comprising:
selecting a quantity and type of neural networks to use for deduplicating objects based upon computing resources and training time;
processing, by a neural network having a type selected for deduplicating the objects, a set of entity feature encodings of entities for deduplication to generate a first reduced vector and a second reduced vector, wherein the neural network is implemented as a vector dimension reducing neural network to produce a reduced dimension entity-specific vector for each entity in a pair that represents feature vectors for a corresponding entity in the pair;
generating representations of likelihoods that entity pairs of the entities are duplicates based upon a dot product of the first reduced vector and the second reduced vector;
training a machine learning model using the likelihoods that entity pairs of the entities are duplicates and p-merge values corresponding to a pair of entities for which a current reduced dimension entity-specific vector is generated by the neural network; and
utilizing the machine learning model to deduplicate the objects within a database, wherein a front-end text encoder, a middle stage trained neural network, and a back-end merge indicator function are selected and used to process the entity pairs for deduplicating the objects, wherein the middle stage trained neural network is replicated to scale the machine learning model for concurrently handling multiple pairs of entities.
|