| CPC G06V 20/47 (2022.01) [G06V 20/41 (2022.01)] | 20 Claims |

|
1. A system for evaluating relevance of a text string to a video comprising:
a processor; and
a non-transitory computer-readable medium having instructions executable by the processor for:
identifying a text embedding of the text string;
identifying a plurality of frame embeddings associated with a plurality of frames of the video;
evaluating the text embedding with respect to each frame embedding of the plurality of frame embeddings;
selecting a set of highest-relevance frames based on the evaluating;
generating a text-conditioned video embedding for the video by combining the plurality of frame embeddings associated with the set of highest-relevance frames without contribution of the frame embeddings not associated with the set of highest-relevance frames; and
determining a relevance score of the text string to the video based on the text-conditioned video embedding and the text embedding.
|