| CPC G06N 3/08 (2013.01) [G10L 19/028 (2013.01); G10L 19/038 (2013.01); G10L 25/30 (2013.01); G10L 25/60 (2013.01); G10L 25/69 (2013.01); G06N 3/084 (2013.01); G10L 15/00 (2013.01); G10L 19/00 (2013.01); G10L 19/22 (2013.01)] | 12 Claims |

|
1. A method of designing a neural network-based audio codec, the method comprising:
generating a quantized latent vector and a reconstructed signal corresponding to an input signal by using a white noise modeling-based quantization process;
computing a total loss for training of the neural network-based audio codec, based on the input signal, the reconstructed signal, and the quantized latent vector;
training the neural network-based audio codec by using the total loss; and
validating the trained neural network-based audio codec to select the best neural network-based audio codec,
wherein the computing of the total loss comprises:
calculating a reconstruction loss term as mean squared error (MSE) between the input signal and the reconstructed signal, a bit-rate control loss term as an entropy of the quantized latent vector, and a perceptual loss term reflecting human perceptual characteristics, respectively; and
calculating the total loss by adding the reconstruction loss term, the bit-rate control loss term, and the perceptual loss term,
wherein the reconstruction loss term is determined based on a square of an L2-norm of a difference between the input signal and the reconstructed signal,
wherein the bit-rate control loss term is determined based on probability distribution for a latent vector with added random noise,
wherein the latent vector with added random noise is generated by adding a random noise to a latent vector output from an encoder of the neural network-based audio codec that receives the input signal, and
wherein the quantized latent vector is generated by de-warping the latent vector with added random noise.
|