US 12,230,281 B2
	Downmixer and method of downmixing
Franz Reutelhuber, Erlangen (DE); Bernd Edler, Erlangen (DE); Eleni Fotopoulou, Erlangen (DE); Markus Multrus, Erlangen (DE); Pallavi Maben, Erlangen (DE); and Sascha Disch, Erlangen (DE)
Assigned to FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V., Munich (DE)
Filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V., Munich (DE)
Filed on Aug. 12, 2021, as Appl. No. 17/400,872.
Application 17/400,872 is a continuation of application No. PCT/EP2020/055669, filed on Mar. 4, 2020.
Claims priority of application No. 19161076 (EP), filed on Mar. 6, 2019.
Prior Publication US 2021/0375293 A1, Dec. 2, 2021
Int. Cl. G10L 19/008 (2013.01); G10L 19/087 (2013.01); H04S 3/00 (2006.01)

CPC G10L 19/008 (2013.01) [G10L 19/087 (2013.01); H04S 3/00 (2013.01); H04S 2400/03 (2013.01)]

40 Claims

1. A downmixer for downmixing a multi-channel audio signal having at least two audio channels, comprising:

a weighting value estimator configured for estimating band-wise weighting values for the at least two audio channels;

a spectral weighter configured for weighting spectral domain representations of the at least two audio channels using the band-wise weighting values;

a converter configured for converting weighted spectral domain representations of the at least two audio channels into time representations of the at least two audio channels, wherein the converter is configured to generate raw time representations using a spectrum-time algorithm, and to post process, using a post-processor, the raw time representations to acquire the time representations; and

a mixer configured for mixing the time representations of the at least two audio channels to acquire a downmix audio signal,

wherein the spectral domain representations are either purely real or purely imaginary, wherein the weighting value estimator is configured to obtain an estimated imaginary spectral domain representation when a spectral domain representation of the spectral domain representations is purely real, or to obtain an estimated real spectral domain representation when the spectral domain representation of the spectral domain representations is purely imaginary, and wherein the weighting value estimator is configured to estimate the band-wise weighting values using the estimated imaginary spectral domain representation or the estimated real spectral domain representation, or

wherein a first spectral domain representation of a first audio channel of the at least two audio channels comprises a first time resolution or a first frequency resolution, wherein a second spectral domain representation of a second audio channel of the at least two audio channels comprises a second time resolution or a second frequency resolution, wherein the second time resolution or the second frequency resolution is different from the first time resolution or the first frequency resolution, and wherein the weighting value estimator is configured to calculate the band-wise weighting values so that a frequency resolution of a plurality of bands associated with the band-wise weighting values is lower than the first frequency resolution and the second frequency resolution or is equal to the lower one of the first frequency resolution and the second frequency resolution, or

wherein the first spectral domain representation comprises a first plurality of spectral values in a band, wherein the second spectral domain representation comprises a second plurality of spectral values in the band, the second plurality of spectral values being higher than the first plurality of spectral values, and wherein the weighting value estimator is configured to combine two or more spectral values of the second plurality of spectral values or to select, from the second plurality of spectral values, a subset of spectral values, to calculate a mixed term depending on products or linear combinations of spectral values from the at least two audio channels in the band using a result of combining the two or more spectral values of the second plurality of spectral values or using the subset of spectral values, and to calculate the band-wise weighting values using the mixed term, or

wherein the first spectral domain representation of the spectral domain representations comprises a plurality of first spectral values representing a first time bin size and a first frequency bin size, wherein the second spectral domain representation of the spectral domain representations comprises a plurality of spectral values representing a second time bin size and a second frequency bin size, wherein the first time bin size is greater than the second time bin size, or wherein the first frequency bin size is lower than the second frequency bin size, and wherein the weighting value estimator is configured to combine a plurality of spectral values from the first spectral domain representation to acquire a first combined spectral domain representation in which a combined frequency bin size is equal to the second frequency bin size, or to combine a plurality of spectral values from the second spectral domain representation to acquire a first combined spectral domain representation in which a combined time bin size is equal to the first time bin size, or

wherein the first spectral domain representation of the first audio channel of the at least two audio channels comprises a plurality of first spectral values representing the first time bin size and the first frequency bin size, wherein the second spectral domain representation of the second audio channel of the at least two audio channels comprises at least two subframes, wherein each subframe of the at least two subframes comprises a plurality of spectral values representing a second time bin size and a second frequency bin size, wherein the first time bin size is greater than the second time bin size, or wherein the first frequency bin size is lower than the second frequency bin size, wherein the weighting value estimator is configured to combine spectral values belonging to an identical frequency bin from each subframe of the at least two subframes of the second spectral domain representation in a first manner to acquire a first group of combined spectral values, and to combine spectral values belonging to an identical frequency bin from each subframe of the at least two subframes of the second spectral domain representation in a second manner to acquire a second group of combined spectral values, the second manner being different from the first manner, wherein the first group of combined spectral values and the second group of combined spectral values represent a combined spectral domain representation comprising the first time bin size and the first frequency bin size, and to use the spectral values of the combined spectral domain representation and the first spectral domain representation for the estimating of the band-wise weighting values, or

wherein the weighting value estimator is configured to calculate a plurality of first band-wise weighting values for a plurality of bands of the first audio channel of the at least two audio channels using a first calculation rule depending on at least two of spectral values of the first spectral domain representation of the first audio channel of the at least two audio channels, spectral values of the second spectral domain representation of the second audio channel of the at least two audio channels, spectral values of a single combined spectral domain representation derived from the spectral values of the first spectral domain representation or the second spectral domain representation, spectral values of a first combined spectral domain representation derived from the spectral values of the first spectral domain representation, and spectral values of a second combined spectral domain representation derived from the spectral values of the second spectral domain representation, and wherein the weighting value estimator is configured to calculate a plurality of second band-wise weighting values for the plurality of bands of the first audio channel of the at least two audio channels using a second calculation rule depending on at least two of the plurality of first band-wise weighting values, the spectral values of the first spectral domain representation of the first audio channel of the at least two audio channels, the spectral values of the second spectral domain representation of the second audio channel of the at least two audio channels, the spectral values of the single combined spectral domain representation derived from the spectral values of the first spectral domain representation or the second spectral domain representation, the spectral values of the first combined spectral domain representation derived from the spectral values of the first spectral domain representation, and the spectral values of the second combined spectral domain representation derived from the spectral values of the second spectral domain representation, wherein the second calculation rule is different from the first calculation rule;

wherein at least one of the weighting value estimator, the spectral weighter, the converter, and the mixer comprises a hardware implementation.