CPC G06F 40/30 (2020.01) [G06F 18/2411 (2023.01); G06F 40/205 (2020.01); G06V 20/41 (2022.01)] | 10 Claims |
1. A method for detecting a moment described by a sentence query in a video and including a memory for storing an input video and an input sentence query and a processor that executes instructions stored in the memory, the method comprising:
performing, by the processor, supervised learning in a neural network model using learning samples;
dividing, by the processor, the input video into units of chunks and generating a chunk-level feature sequence based on features that are extracted in a form of vectors from respective chunks;
dividing, by the processor, the input sentence query into units of words and generating a sentence-level feature sequence based on features that are extracted in a form of vectors from respective words;
generating, by the processor, a chunk-sentence relation feature sequence including contextual information of the video by extracting a relation between the chunk-level feature sequence and the sentence-level feature sequence;
estimating, by the processor, a temporal interval corresponding to the sentence query in the video based on the chunk-sentence relation feature sequence;
generating, by the processor, a proposal probability map for target contextual information related to the sentence-level feature sequence; and
detecting, by the processor, a temporal interval having a highest probability in the proposal probability map as the temporal interval corresponding to the sentence query in the video, wherein
the chunk-sentence relation feature sequence corresponds to one piece of information generated by integrating two pieces of information corresponding to interrelated chunk-level features and sentence feature vectors,
the proposal probability map is generated by combining a local proposal probability map and a global proposal probability map,
the local proposal probability map is generated by independently calculating probabilities that, for respective chunks, a corresponding chunk will be a start point, end point, and middle point of the temporal interval corresponding to the sentence query, based on the chunk-sentence relation feature sequence, and
the global proposal probability map is generated by directly exploiting the chunk-sentence relation feature sequence.
|