US 12,175,194 B2
Video classification method and apparatus, computer device, and storage medium
Bing Xin Qu, Shenzhen (CN); and Mao Zheng, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on Mar. 4, 2021, as Appl. No. 17/192,580.
Application 17/192,580 is a continuation of application No. PCT/CN2019/116660, filed on Nov. 8, 2019.
Claims priority of application No. 201811535837.0 (CN), filed on Dec. 14, 2018.
Prior Publication US 2021/0192220 A1, Jun. 24, 2021
Int. Cl. G06F 40/30 (2020.01); G06F 18/22 (2023.01); G06F 18/2415 (2023.01); G06F 18/2431 (2023.01); G06F 40/279 (2020.01); G06V 10/764 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 20/62 (2022.01); G10L 25/24 (2013.01); G10L 25/57 (2013.01)
CPC G06F 40/30 (2020.01) [G06F 18/22 (2023.01); G06F 18/2415 (2023.01); G06F 18/2431 (2023.01); G06F 40/279 (2020.01); G06V 10/764 (2022.01); G06V 10/811 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/635 (2022.01); G10L 25/24 (2013.01); G10L 25/57 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A video classification method, performed by a computer device, the method comprising:
obtaining a target video;
classifying an image frame in the target video by using a first classification model, to obtain an image classification result, the first classification model being configured to perform classification based on an image feature of the image frame;
classifying an audio in the target video by using a second classification model, to obtain an audio classification result, the second classification model being configured to perform classification based on an audio feature of the audio;
classifying textual description information corresponding to the target video by:
using a third classification model, to obtain a textual classification result, the third classification model being configured to perform classification based on a text feature of the textual description information;
obtaining the textual description information corresponding to the target video, the textual description information comprising at least one of a video title, video background music information, or video publisher information;
preprocessing the textual description information, wherein the preprocessing comprises at least one of de-noising, word segmentation, entity word retrieving, or stop word removal; and
classifying the preprocessed textual description information by using a Bi-directional long short-term memory network (Bi-LSTM) and a text classifier in the third classification model, to obtain the textual classification result; and
determining a target classification result of the target video according to the image classification result, the audio classification result, and the textual classification result.