| CPC G10L 25/60 (2013.01) [G10L 25/45 (2013.01); G10L 25/84 (2013.01); G10L 2025/783 (2013.01)] | 19 Claims |

|
1. A speech processing method, comprising:
acquiring a speech sequence, obtaining a plurality of speech sub-sequences by performing framing processing on the speech sequence, and extracting a target feature of each speech sub-sequence of the plurality of speech sub-sequences;
detecting each speech sub-sequence of the plurality of speech sub-sequences by a speech detection model according to each target feature, and determining valid speech based on a detection result;
inputting a target feature corresponding to the valid speech into a voiceprint recognition model, and screening out target speech from the valid speech by the voiceprint recognition model; and
controlling the target speech to be forwarded to a client;
wherein the voiceprint recognition model comprises a convolutional layer, a double-layer Long-Short Term Memory (LSTM) layer, a pooling layer, and an affine layer, and the inputting the target feature corresponding to the valid speech into the voiceprint recognition model and screening out the target speech from the valid speech by the voiceprint recognition model, comprises:
determining the target feature corresponding to the valid speech as a valid target feature;
inputting the valid target feature into the voiceprint recognition model, and obtaining a deep feature of the valid target feature by performing feature extraction on the valid target feature sequentially using the convolutional layer and the double-layer LSTM layer, wherein the deep feature comprises time dimension and feature dimension;
obtaining a maximum feature and a mean feature of the deep feature in the time dimension by inputting the deep feature into the pooling layer for feature extraction, and obtaining a hidden layer feature by summing the maximum feature and the mean feature; and
obtaining a speech representation vector of a valid speech sub-sequence corresponding to the valid target feature by inputting the hidden layer feature into the affine layer for affine transformation.
|