| CPC G10L 19/16 (2013.01) [G06N 3/02 (2013.01)] | 20 Claims |

|
1. A method performed by a multi-band synchronized neural vocoder, comprising:
receiving an input audio signal to be processed by the multi-band synchronized neural vocoder;
separating, by the multi-band synchronized neural vocoder, the input audio signal into a plurality of frequency bands;
obtaining, by the multi-band synchronized neural vocoder, a plurality of audio signals that corresponds to the plurality of frequency bands, based on separating the input audio signal into the plurality of frequency bands;
downsampling, by the multi-band synchronized neural vocoder, each of the plurality of audio signals, based on obtaining the plurality of audio signals;
processing, by the multi-band synchronized neural vocoder, the downsampled audio signals; and
generating, by the multi-band synchronized neural vocoder, an audio output signal based on processing the downsampled audio signals,
wherein, in the multi-band synchronized neural vocoder, each of the frequency bands has its own fully connected layer and a corresponding softmax layer, and
wherein weight parameters of the multi-band synchronized neural vocoder are shared across the plurality of frequency bands except for final fully connected layers and softmax layers for each of the frequency bands.
|
|
8. A multi-band synchronized neural vocoder device, comprising:
at least one memory configured to store program code;
at least one processor configured to read the program code and operate as instructed by the program code, the program code including:
receiving code configured to cause that least one processor to receive an input audio signal to be processed by the multi-band synchronized neural vocoder;
separating code configured to cause the at least one processor to separate the input audio signal into a plurality of frequency bands;
obtaining code configured to cause the at least one processor to obtain a plurality of audio signals that corresponds to the plurality of frequency bands, based on separating the input audio signal into the plurality of frequency bands;
downsampling code configured to cause the at least one processor to downsample each of the plurality of audio signals, based on obtaining the plurality of audio signals;
processing code configured to cause the at least one processor to process the downsampled audio signals; and
generating code configured to cause the at least one processor to generate an audio output signal based on processing the downsampled audio signals,
wherein, in the multi-band synchronized neural vocoder, each of the frequency bands has its own fully connected layer and a corresponding softmax layer, and
wherein weight parameters of the multi-band synchronized neural vocoder are shared across the plurality of frequency bands except for final fully connected layers and softmax layers for each of the frequency bands.
|