US 12,223,967 B2
	Audio decoder, apparatus for determining a set of values defining characteristics of a filter, methods for providing a decoded audio representation, methods for determining a set of values defining characteristics of a filter and computer program
Guillaume Fuchs, Erlangen (DE); Srikanth Korse, Erlangen (DE); and Emmanuel Ravelli, Erlangen (DE)
Assigned to Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Munich (DE)
Filed by Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Munich (DE)
Filed on Oct. 7, 2021, as Appl. No. 17/496,560.
Application 17/496,560 is a continuation of application No. PCT/EP2020/060148, filed on Apr. 9, 2020.
Prior Publication US 2022/0223161 A1, Jul. 14, 2022
Int. Cl. G10L 19/02 (2013.01); G06N 3/048 (2023.01); G06N 3/08 (2023.01); G10L 21/0316 (2013.01)

CPC G10L 19/02 (2013.01) [G06N 3/048 (2023.01); G06N 3/08 (2013.01); G10L 21/0316 (2013.01)]

26 Claims

1. An audio decoder for providing a decoded audio representation on the basis of an encoded audio representation,

wherein the audio decoder comprises a filter for providing an enhanced audio representation of the decoded audio representation,

wherein the filter is configured to acquire a plurality of scaling values, which are associated with different frequency bins or frequency ranges, on the basis of spectral values of the decoded audio representation which are associated with different frequency bins or frequency ranges, and

wherein the filter is configured to scale spectral values of the decoded audio signal representation, or a pre-processed version thereof, using the scaling values, to acquire the enhanced audio representation;

wherein the filter comprises a Neural network or a machine learning structure configured to provide the scaling values on the basis of a plurality of spectral values describing the decoded audio representation, spectral values which are associated with different frequency bins or frequency ranges;

wherein the filter is configured to normalize input features of the neural network or of the machine learning structure to a predetermined mean value and/or to a predetermined variance or standard deviation.

22. A method for providing an enhanced audio representation on the basis of an encoded audio representation,

wherein the method comprises providing a decoded audio representation of the encoded audio representation,

wherein the method comprises acquiring a plurality of scaling values, which are associated with different frequency bins or frequency ranges, on the basis of spectral values of the decoded audio representation which are associated with different frequency bins or frequency ranges, and wherein the method comprises scaling spectral values of the decoded audio signal representation, or a pre-processed version thereof, using the scaling values, to acquire the enhanced audio representation;

wherein the method comprises using a Neural network or a machine learning to provide the scaling values on the basis of a plurality of spectral values describing the decoded audio representation, spectral values which are associated with different frequency bins or frequency ranges;

wherein the method comprises normalizing input features of the neural network or of the machine learning to a predetermined mean value and/or to a predetermined variance or standard deviation.

24. An audio decoder for providing a decoded audio representation on the basis of an encoded audio representation,

wherein the audio decoder comprises a filter for providing an enhanced audio representation of the decoded audio representation,

wherein the filter is configured to determine a plurality of scaling values associated with a current frame on the basis of spectral values of the decoded audio representation, which are associated with different frequency bins or frequency ranges, of one or more frames following the current frame.

25. A method for providing an enhanced audio representation on the basis of an encoded audio representation,

wherein the method comprises providing a decoded audio representation of the encoded audio representation,

wherein the method comprises scaling spectral values of the decoded audio signal representation, or a pre-processed version thereof, using the scaling values, to acquire the enhanced audio representation;

wherein a plurality of scaling values associated with a current frame are determined on the basis of spectral values of the decoded audio representation, which are associated with different frequency bins or frequency ranges, of one or more frames following the current frame.