| CPC G10H 1/0091 (2013.01) [G06N 3/0442 (2023.01); G06N 3/045 (2023.01); G06N 3/0499 (2023.01); G06N 3/08 (2013.01); G10H 1/16 (2013.01); G10H 2210/215 (2013.01); G10H 2210/281 (2013.01); G10H 2210/311 (2013.01); G10H 2250/025 (2013.01); G10H 2250/311 (2013.01)] | 20 Claims |

|
1. A computer-implemented method of processing audio data, the method comprising:
receiving input audio data (x) comprising a time-series of amplitude values;
transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x);
transforming the input frequency band decomposition (X1) into a first latent representation (Z);
processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z , Z1 );
transforming the second latent representation (Z , Z1 ) to obtain a discrete approximation (X3 );
element-wise multiplying the discrete approximation (X3 ) and a residual feature map (R, X5 ) to obtain a modified feature map, wherein the residual feature map (R, X5 ) is derived from the input frequency band decomposition (X1);
processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1 , X1.2 ), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network;
summing the waveshaped frequency band decomposition (X1 , X1.2 ) and a modified frequency band decomposition (X2 , X1.1 ) to obtain a summation output (X0 ), wherein the modified frequency band decomposition (X2 , X1.1 ) is derived from the modified feature map; and
transforming the summation output (X0 ) to obtain target audio data (y ).
|