US 12,153,588 B2
	Multimodal analysis for content item semantic retrieval and identification
Peter Martigny, San Francisco, CA (US); Fedor Bartosh, San Jose, CA (US); Danish Shaikh, Dale City, CA (US); Vinh Nguyen, San Jose, CA (US); Manasi Deshmukh, San Jose, CA (US); Ratul Ray, Santa Clara, CA (US); Nitish Aggarwal, Sunnyvale, CA (US); Srimaruti Manoj Nimmagadda, Saratoga, CA (US); Kapil Kumar, London (GB); and Sameer Girolkar, San Jose, CA (US)
Assigned to ROKU, INC., San Jose, CA (US)
Filed by ROKU, INC., San Jose, CA (US)
Filed on Feb. 10, 2023, as Appl. No. 18/167,724.
Prior Publication US 2024/0273105 A1, Aug. 15, 2024
Int. Cl. G06F 16/2457 (2019.01); G06F 16/242 (2019.01); G06F 16/9535 (2019.01)

CPC G06F 16/24578 (2019.01) [G06F 16/243 (2019.01); G06F 16/9535 (2019.01)]

20 Claims

1. A computer-implemented method, comprising:

generating for each content item of a plurality of content items of a repository, based on a similarity between a first vector for an embedding indicative of a first data type generated from a query associated with the plurality of content items and a first respective vector for an embedding indicative of the first data type generated for the content item input to a first predictive model trained to identify the similarity between the embedding indicative of the first data type generated from the query and the embedding indicative of the first data type generated for the content item, a respective first similarity score, and

generating for each content item of the plurality of content items, based on a similarity between a second vector for an embedding indicative of a second data type generated from the query and a second respective vector for an embedding indicative of the second data type generated for the content item input to a second predictive model trained to identify the similarity between the embedding indicative of the second data type generated from the query and the embedding indicative of the second data type generated for the content item, a respective second similarity score,

wherein the first predictive model and the second predictive model are trained via a training method comprising:

training the first predictive model on a first data set comprising labeled data indicating at least one candidate embedding-to-embedding pairing for the first data type, and the second predictive model on a second data set comprising labeled data indicating at least one candidate embedding-to-embedding pairing for the second data type,

generating a set of parameters for predicting data type-to-data type pairings based on the training,

introducing an unlabeled data set for another plurality of content items into the first predictive model and the second predictive model,

applying the set of parameters to the unlabeled data set, and

generating the respective first similarity scores and the respective second similarity scores based on the applied set of parameters;

normalizing, for each content item of the plurality of content items, the respective first similarity score and the respective second similarity score into a respective normalized similarity score;

identifying, based on the respective normalized similarity scores for the plurality of content items, a set of content items of the plurality of content items with respective normalized similarity scores that satisfy a similarity score threshold;

generating, based on an amount of tokenized keywords from the query mapped to respective tokenized keywords from a respective description of each content item of the plurality of content items, a respective mapping score for each content item of the plurality of content items;

identifying a set of content items of the plurality of content items with respective mapping scores that satisfy a mapping score threshold; and

outputting an indication of content items that are identified in the set of content items with respective normalized similarity scores that satisfy the similarity score threshold and identified in the set of content items with respective mapping scores that satisfy the mapping score threshold.