US 12,223,274 B2
Representation learning with side information
Oren Barkan, Rishon Lezion (IL); Avi Caciularu, Tel Aviv (IL); Idan Rejwan, Tel Aviv (IL); Yonathan Weill, Tel Aviv (IL); Noam Koenigstein, Tel-Aviv (IL); Ori Katz, Tel-Aviv (IL); Itzik Malkiel, Givaatayim (IL); and Nir Nice, Salit (IL)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Oct. 29, 2021, as Appl. No. 17/452,818.
Prior Publication US 2023/0137718 A1, May 4, 2023
Int. Cl. G06F 40/295 (2020.01); G06F 16/28 (2019.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)
CPC G06F 40/295 (2020.01) [G06F 16/282 (2019.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
training a machine learning (ML) model with inputs including:
a dataset defining entities and including co-occurrence data, the co-occurrence data including co-occurrence relations for pairs of the entities within the dataset; and
side information defining explicit relations between the pairs of the entities, the side information including an explicitly-provided relation type identifier that identifies a relationship type for multiple of the pairs of the entities in the dataset; and
jointly modeling, with the ML model, the co-occurrence relations and the explicit relations for the pairs of the entities to compute similarity metrics for different pairs of the entities within the dataset, the similarity metrics for a given pair of the entities including:
a first probability distribution representing a relational similarity between two entities of the given pair based on the co-occurrence relations for the two entities within the dataset;
a second probability distribution representing a second relational similarity between the two entities of the given pair based on the explicitly-provided relation type identifier for the given pair; and
a joint probability distribution that is based on the first probability distribution and the second probability distribution;
based on the similarity metrics, identifying a most similar replacement entity from the dataset for each of the entities within the dataset; and
for a select entity received as an input, outputting the most similar replacement entity.