CPC G06V 20/46 (2022.01) [G06F 18/214 (2023.01); G06F 18/241 (2023.01); G06F 18/253 (2023.01); G06N 20/00 (2019.01); G06V 10/22 (2022.01); G06V 10/40 (2022.01); G06V 10/764 (2022.01); G06V 10/768 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/635 (2022.01); G06V 20/70 (2022.01); G10L 15/08 (2013.01); G06N 3/08 (2013.01); G06V 30/10 (2022.01)] | 16 Claims |
1. A video classification method, comprising:
extracting a keyword in a video according to multi-modal information of the video;
acquiring background knowledge corresponding to the keyword, and determining a text to be recognized according to the keyword and the background knowledge; and
classifying the text to be recognized to obtain a class of the video,
wherein the extracting a keyword in a video according to multi-modal information of the video comprises:
performing feature extraction on each piece of modal information in the multi-modal information, so as to obtain features corresponding to each piece of modal information;
fusing the features corresponding to each piece of modal information to obtain a fused feature; and
performing a word labeling according to the fused feature in the video to determine the keyword in the video,
wherein the multi-modal information comprises text content and visual information, the visual information comprises first visual information and second visual information, the first visual information is visual information corresponding to a text in a video frame in the video, the second visual information is a key frame in the video, and the performing feature extraction on each piece of modal information in the multi-modal information, so as to obtain features corresponding to each piece of modal information comprises:
performing a first text encoding operation on the text content to obtain a text feature;
performing a second text encoding operation on the first visual information to obtain a first visual feature; and
performing an image encoding operation on the second visual information to obtain a second visual feature.
|