US 12,308,039 B2
	Multi-band synchronized neural vocoder
Chengzhu Yu, Bellevue, WA (US); Meng Yu, Bellevue, WA (US); Heng Lu, Sammamish, WA (US); and Dong Yu, Bothell, WA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Mar. 4, 2022, as Appl. No. 17/687,266.
Application 17/687,266 is a continuation of application No. 16/576,943, filed on Sep. 20, 2019, granted, now 11,295,751.
Prior Publication US 2022/0189495 A1, Jun. 16, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 19/00 (2013.01); G06N 3/02 (2006.01); G10L 19/16 (2013.01)

CPC G10L 19/16 (2013.01) [G06N 3/02 (2013.01)]

20 Claims

1. A method performed by a multi-band synchronized neural vocoder, comprising:

receiving an input audio signal to be processed by the multi-band synchronized neural vocoder;

separating, by the multi-band synchronized neural vocoder, the input audio signal into a plurality of frequency bands;

obtaining, by the multi-band synchronized neural vocoder, a plurality of audio signals that corresponds to the plurality of frequency bands, based on separating the input audio signal into the plurality of frequency bands;

downsampling, by the multi-band synchronized neural vocoder, each of the plurality of audio signals, based on obtaining the plurality of audio signals;

processing, by the multi-band synchronized neural vocoder, the downsampled audio signals; and

generating, by the multi-band synchronized neural vocoder, an audio output signal based on processing the downsampled audio signals,

wherein, in the multi-band synchronized neural vocoder, each of the frequency bands has its own fully connected layer and a corresponding softmax layer, and

wherein weight parameters of the multi-band synchronized neural vocoder are shared across the plurality of frequency bands except for final fully connected layers and softmax layers for each of the frequency bands.

8. A multi-band synchronized neural vocoder device, comprising:

at least one memory configured to store program code;

at least one processor configured to read the program code and operate as instructed by the program code, the program code including:

receiving code configured to cause that least one processor to receive an input audio signal to be processed by the multi-band synchronized neural vocoder;

separating code configured to cause the at least one processor to separate the input audio signal into a plurality of frequency bands;

obtaining code configured to cause the at least one processor to obtain a plurality of audio signals that corresponds to the plurality of frequency bands, based on separating the input audio signal into the plurality of frequency bands;

downsampling code configured to cause the at least one processor to downsample each of the plurality of audio signals, based on obtaining the plurality of audio signals;

processing code configured to cause the at least one processor to process the downsampled audio signals; and

generating code configured to cause the at least one processor to generate an audio output signal based on processing the downsampled audio signals,

wherein, in the multi-band synchronized neural vocoder, each of the frequency bands has its own fully connected layer and a corresponding softmax layer, and