CPC G10L 15/22 (2013.01) [G06N 3/08 (2013.01); G10L 13/00 (2013.01); G10L 15/02 (2013.01); G10L 15/16 (2013.01); G10L 19/00 (2013.01); G10L 25/24 (2013.01)] | 14 Claims |
1. An electronic device, comprising circuitry configured to:
window an audio input signal to extract features from the audio input signal,
generate a first senone symbol sequence by propagating the extracted features through a neural network,
decoding the first senone symbol sequence to obtain a transcript of the audio input signal,
perform forced alignment on the transcript to obtain a second senone symbol sequence,
determine differences between the second senone symbol sequence and the first senone symbol sequence by computing a gradient of a loss function,
modify the gradient of the loss function by multiplying the gradient of the loss function with a predefined multiplication factor,
perform gradient descent based on the extracted features and the modified gradient of the loss function to generate enhanced features for the extracted features from the transcript of the audio input signal, and
perform vocoding based on the enhanced features to produce an enhanced audio signal.
|