US 12,475,359 B2
Method and apparatus for determining parameters of a generative neural network
Arijit Biswas, Erlangen (DE); and Simon Plain, Nuremberg (DE)
Assigned to DOLBY INTERNATIONAL AB, Dublin (IE)
Appl. No. 17/927,929
Filed by DOLBY INTERNATIONAL AB, Dublin (IE)
PCT Filed May 31, 2021, PCT No. PCT/EP2021/064511
§ 371(c)(1), (2) Date Nov. 28, 2022,
PCT Pub. No. WO2021/245015, PCT Pub. Date Dec. 9, 2021.
Claims priority of provisional application 63/177,511, filed on Apr. 21, 2021.
Claims priority of provisional application 63/032,903, filed on Jun. 1, 2020.
Claims priority of application No. 20181683 (EP), filed on Jun. 23, 2020.
Prior Publication US 2023/0229892 A1, Jul. 20, 2023
Int. Cl. G10L 15/16 (2006.01); G06N 3/0455 (2023.01); G06N 3/082 (2023.01); G10L 19/26 (2013.01); G10L 25/30 (2013.01); G10L 25/69 (2013.01)
CPC G06N 3/0455 (2023.01) [G06N 3/082 (2013.01); G10L 19/26 (2013.01); G10L 25/30 (2013.01); G10L 25/69 (2013.01)] 21 Claims
OG exemplary drawing
 
8. A method of generating, in a dynamic range reduced domain, enhanced audio data from a low-bitrate audio bitstream, wherein the method includes the steps of:
(a) receiving the low-bitrate audio bitstream;
(b) core decoding the low-bitrate audio bitstream and obtaining dynamic range reduced raw audio data based on the low-bitrate audio bitstream;
(c) inputting the dynamic range reduced raw audio data into a Generator of a Generative Adversarial Network, GAN, for processing the dynamic range reduced raw audio data, wherein the Generator includes an encoder stage and a decoder stage, wherein the encoder stage and the decoder stage each include a plurality of layers with one or more filters in each layer, wherein each filter includes one or more weights, wherein a bottleneck layer of the encoder stage of the Generator maps to a coded audio feature space between the encoder stage and the decoder stage, wherein one or more layers of the encoder stage and/or the decoder stage adjacent to the bottleneck layer are more sparse than the bottleneck layer, and wherein the bottleneck layer is more sparse than one or more outer layers of the encoder stage and/or the decoder stage, wherein sparsity is determined by a percentage of zero-valued weights, and wherein the one or more layers of the encoder stage and/or the decoder stage adjacent to the bottleneck layer have a higher percentage of zero-valued weights than the bottleneck layer,
wherein the Generator is obtainable by pruning, wherein the pruning includes the steps of:
(i) pruning the encoder stage and/or the decoder stage based on a set of sensitivity parameters that indicate thresholds for the pruning; and
(ii) pruning the bottleneck layer of the encoder stage based on the set of sensitivity parameters,
and wherein the pruning includes zeroing one or more weights based on the set of sensitivity parameters;
(d) enhancing the dynamic range reduced raw audio data by the Generator in the dynamic range reduced domain;
(e) obtaining, as an output from the Generator, enhanced dynamic range reduced audio data for subsequent expansion of the dynamic range; and
(f) expanding the enhanced dynamic range reduced audio data to the expanded dynamic range domain by performing an expansion operation.