| CPC H04N 19/30 (2014.11) [G06N 20/00 (2019.01); H04N 19/172 (2014.11)] | 10 Claims |

|
1. A computing system, the system comprising:
one or more processors;
one or more non-transitory computer readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising:
obtaining a training dataset, wherein the training dataset comprises a search query, a ground-truth video, and a negative video-query pair, wherein the ground-truth video is responsive to the search query;
processing the ground-truth video with a machine-learned hierarchical video encoder model to generate a plurality of contextualized segment representations, wherein each contextualized segment representation of the plurality of contextualized segment representations comprise segment-level semantic information for a respective video segment;
determining a first video-query compatibility score based on the search query and the plurality of contextualized segment representations;
determining a second video-query compatibility score based on a respective video representation and respective query of the negative video-query pair;
evaluating a loss function that evaluates a difference between the first video-query compatibility score and the second video-query compatibility score; and
adjusting one or more parameters of the machine-learned hierarchical video encoder model based at least in part on the loss function.
|