| CPC G06V 20/46 (2022.01) [G06F 16/732 (2019.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 20/49 (2022.01)] | 20 Claims |

|
1. A method for video moment retrieval, comprising:
obtaining video content and a textual query associated with a video moment, the video content comprising a plurality of video segments, the textual query comprising one or more words;
extracting a plurality of visual features for the video segments of the video content;
extracting one or more textual features for the one or more words in the textual query;
combining the visual features of the plurality of video segments and the textual features of the one or more words to generate a similarity matrix, each element of the similarity matrix representing a similarity level between a respective one of the video segments and a respective one of the one or more words;
generating one or more segment-attended sentence features for the textual query based on the one or more textual features for the one or more words in the textual query and the similarity matrix;
combining the visual features of the plurality of video segments and the segment-attended sentence features of the textual query to generate a plurality of alignment scores; and
retrieving a subset of the video content associated with the textual query from the video segments based on the plurality of alignment scores.
|