| CPC G06F 16/215 (2019.01) [G06F 16/22 (2019.01)] | 21 Claims |

|
1. A computer-implemented method comprising:
receiving, by at least one processor, a dataset of entity records, the dataset comprising a plurality of entity records associated with one or more entities, wherein each entity record comprises at least one element;
identifying, by at least one processor, a candidate entity record of the plurality of entity records;
utilizing, by the at least one processor, first predefined rules of a set of predefined rules to generate a first augmented record by augmenting the at least one element of the candidate entity record with a first augmentation type;
wherein the first augmentation type represents a positive contrast between the first augmented record and the candidate entity record based at least in part on the set of predefined rules;
utilizing, by the at least one processor, second predefined rules of a set of predefined rules to generate a second augmented record by augmenting the at least one element of the candidate entity record with a second augmentation type;
wherein the second augmentation type represents a negative contrast between the second augmented record and the candidate entity record based at least in part on the set of predefined rules;
utilizing, by the at least one processor, at least one contrastive loss optimization to train parameters of an unsupervised self-contrastive machine learning language model to distinguish between similar entity records representing a same entity and dissimilar entity records represented different entities based at least in part on the at least one element of each entity record;
wherein the at least one contrastive loss optimization trains the parameters based at least in part on:
the positive contrast between the first augmented record and the candidate entity record, and
the negative contrast between the second augmented record and the candidate entity record; and
utilizing, by the at least one processor, an index engine to index the entity records determined to have a positive contrast with the candidate entity record;
wherein the index engine indexes the identified entity records into at least one database table so as to merge the entity records having the positive contrast with the candidate entity record.
|