| CPC G06V 20/41 (2022.01) [G06F 16/7834 (2019.01); G06V 20/46 (2022.01)] | 18 Claims |

|
1. A method, comprising:
performing semantic segmentation on to-be-processed video data to generate a corresponding semantic segmentation map, and extracting a semantic segmentation feature of the to-be-processed video data based on the semantic segmentation map;
extracting an audio feature of each audio file in a pre-established audio set; and
aligning the audio feature and the semantic segmentation feature, selecting a target audio file from the audio set based on an alignment result, and constructing background audio for the to-be-processed video data based on the target audio file,
wherein the aligning the audio feature and the semantic segmentation feature comprises:
performing dimension scaling processing on the audio feature and the semantic segmentation feature based on a preset feature dimension, to generate a target audio feature and a target semantic segmentation feature; and
aligning the target audio feature and the target semantic segmentation feature.
|