US 12,423,977 B2
Method for video moment retrieval, computer system, non-transitory computer-readable medium
Jiawei Chen, Palo Alto, CA (US); and Jenhao Hsiao, Palo Alto, CA (US)
Assigned to INNOPEAK TECHNOLOGY, INC., Palo Alto, CA (US)
Filed by INNOPEAK TECHNOLOGY, INC., Palo Alto, CA (US)
Filed on Aug. 4, 2023, as Appl. No. 18/365,458.
Application 18/365,458 is a continuation of application No. PCT/US2021/019817, filed on Feb. 26, 2021.
Prior Publication US 2024/0037948 A1, Feb. 1, 2024
Int. Cl. G06V 20/40 (2022.01); G06F 16/732 (2019.01); G06V 10/74 (2022.01); G06V 10/774 (2022.01)
CPC G06V 20/46 (2022.01) [G06F 16/732 (2019.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 20/49 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method for video moment retrieval, comprising:
obtaining video content and a textual query associated with a video moment, the video content comprising a plurality of video segments, the textual query comprising one or more words;
extracting a plurality of visual features for the video segments of the video content;
extracting one or more textual features for the one or more words in the textual query;
combining the visual features of the plurality of video segments and the textual features of the one or more words to generate a similarity matrix, each element of the similarity matrix representing a similarity level between a respective one of the video segments and a respective one of the one or more words;
generating one or more segment-attended sentence features for the textual query based on the one or more textual features for the one or more words in the textual query and the similarity matrix;
combining the visual features of the plurality of video segments and the segment-attended sentence features of the textual query to generate a plurality of alignment scores; and
retrieving a subset of the video content associated with the textual query from the video segments based on the plurality of alignment scores.