| CPC G06F 40/295 (2020.01) [G06F 16/282 (2019.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] | 20 Claims |

|
1. A method comprising:
training a machine learning (ML) model with inputs including:
a dataset defining entities and including co-occurrence data, the co-occurrence data including co-occurrence relations for pairs of the entities within the dataset; and
side information defining explicit relations between the pairs of the entities, the side information including an explicitly-provided relation type identifier that identifies a relationship type for multiple of the pairs of the entities in the dataset; and
jointly modeling, with the ML model, the co-occurrence relations and the explicit relations for the pairs of the entities to compute similarity metrics for different pairs of the entities within the dataset, the similarity metrics for a given pair of the entities including:
a first probability distribution representing a relational similarity between two entities of the given pair based on the co-occurrence relations for the two entities within the dataset;
a second probability distribution representing a second relational similarity between the two entities of the given pair based on the explicitly-provided relation type identifier for the given pair; and
a joint probability distribution that is based on the first probability distribution and the second probability distribution;
based on the similarity metrics, identifying a most similar replacement entity from the dataset for each of the entities within the dataset; and
for a select entity received as an input, outputting the most similar replacement entity.
|