CPC H04N 21/4341 (2013.01) [G06F 16/732 (2019.01); G06F 16/7867 (2019.01); H04N 21/84 (2013.01); H04N 21/8456 (2013.01)] | 20 Claims |
1. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising:
sub-dividing a query video into visual segments and audio segments;
generating visual descriptors for the visual segments of the query video utilizing a visual neural network encoder;
generating audio descriptors for the audio segments of the query video utilizing an audio neural network encoder;
determining video segments from a plurality of known videos that are similar to the query video based on the visual descriptors and audio descriptors utilizing an inverse index by:
mapping the visual descriptors and the audio descriptors to codewords; and
identifying the video segments from the plurality of known videos based on the mapped codewords; and
identifying a known video of the plurality of known videos that corresponds to the query video from the determined video segments.
|