US 11,868,723 B2
	Interpreting text-based similarity
Itzik Malkiel, Givaatayim (IL); Noam Koenigstein, Tel Aviv (IL); Oren Barkan, Tel Aviv (IL); Dvir Ginzburg, Tel Aviv (IL); and Nir Nice, Salit (IL)
Assigned to Microsoft Technology Licensing, LLC., Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Mar. 30, 2021, as Appl. No. 17/218,136.
Prior Publication US 2022/0318504 A1, Oct. 6, 2022
Int. Cl. G06F 40/284 (2020.01); G06F 16/9538 (2019.01); G06F 40/295 (2020.01)

CPC G06F 40/284 (2020.01) [G06F 16/9538 (2019.01); G06F 40/295 (2020.01)]

20 Claims

1. A system comprising:

a processor; and

a memory comprising computer-readable instructions, the memory and the computer-readable instructions configured to, with the processor, implement a pre-trained interpreting text-based similarity (ITBS) model, to cause the processor to:

calculate a set of gradients representing a first unlabeled text-based paragraph describing a seed item and a second unlabeled text-based paragraph describing a recommended item predicted to be similar to the seed item, the set of gradients calculated with respect to a cosine similarity function applied on a set of feature vectors, the set of feature vectors comprising a first feature vector representing the first unlabeled text-based paragraph and a second feature vector representing the second unlabeled text-based paragraph, wherein the first unlabeled text-based paragraph and the second unlabeled text-based paragraph comprise an unlabeled paragraph pair;

generate contextualized embeddings based on the set of gradients and a similarity score measuring an affinity between the first unlabeled text-based paragraph and the second unlabeled text-based paragraph, wherein generating the contextualized embeddings includes:

tokenizing the first unlabeled text-based paragraph and the second unlabeled text-based paragraph;

generating a saliency score for each token in the first unlabeled text-based paragraph and for each token in the second unlabeled text-based paragraph, wherein the saliency score is associated with at least one word in an item description;

aggregating the token saliency scores of the first unlabeled text-based paragraph to generate word-scores for the first unlabeled text-based paragraph;

aggregating the token saliency scores of the second unlabeled text-based paragraph to generate word-scores for the second unlabeled text-based paragraph;

matching words from the first unlabeled text-based paragraph and the second unlabeled text-based paragraph based on the similarity score to generate a set of word-pairs, each word-pair in the set of word-pairs comprising a first word selected from the first unlabeled text-based paragraph matched to a second word selected from the second unlabeled text-based paragraph, wherein the first word and the second word have a similar semantic meaning; and

scoring each word-pair using the generated word-scores of the aggregated token saliency scores for both the first unlabeled text-based paragraph and the second unlabeled text-based paragraph to generate a word-pair score, the word-pair score indicating a degree of influence exerted by an individual word-pair on selection of the recommended item from a plurality of candidate items;

select a word-pair from the set of word-pairs based on the word-pair score and a threshold value; and

interpret, based on the selected word-pair, a recommendation generated by a recommendation model.