US 11,947,571 B2
Efficient tagging of content items using multi-granular embeddings
Fares Hedayati, Richmond, CA (US); Young Jin Yun, San Francisco, CA (US); Sneha Chaudhari, Santa Clara, CA (US); Mahesh Subhash Joshi, Belmont, CA (US); Gungor Polatkan, San Jose, CA (US); and Gautam Borooah, Oakland, CA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Apr. 20, 2021, as Appl. No. 17/235,325.
Prior Publication US 2022/0335066 A1, Oct. 20, 2022
Int. Cl. G06F 16/28 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/285 (2019.01) [G06N 20/00 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
storing a plurality of content items and, for each content item in the plurality of content items, a plurality of content embeddings associated with the content item;
storing a plurality of entity names and a plurality of entity name embeddings, wherein each entity name embedding of the plurality of entity name embeddings is associated with a different entity name of the plurality of entity names;
for a content item in the plurality of content items:
identifying a subset of the plurality of entity names;
for an entity name in the subset:
generating a plurality of similarity scores, wherein a similarity score of the plurality of similarity scores is based on a similarity measure applied to an entity name embedding associated with the entity name and a content embedding of the plurality of content embeddings;
generating a distribution of the plurality of similarity scores over the plurality of content embeddings;
extracting a plurality of feature values from the distribution, wherein a feature value extracted from the distribution corresponds to a similarity score of the plurality of similarity scores at a percentile of the distribution;
inputting the plurality of feature values extracted from the distribution into a binary classifier;
generating, by the binary classifier, based on the plurality of feature values extracted from the distribution, a classification, wherein the classification indicates a likelihood of the entity name being associated with a content item of the plurality of content items, wherein the binary classifier is trained to learn features that are predictive of whether particular entity names are associated with particular content items;
based on the classification, determining to associate the entity name with the content item; and
based on the association of the entity name with the content item, determining to transmit the content item to a user computing device;
wherein the method is performed by one or more computing devices.