US 12,431,145 B2
Immersive voice and audio services (IVAS) with adaptive downmix strategies
Harald Mundt, Fürth (DE); David S. McGrath, Rose Bay (AU); and Rishabh Tyagi, Sydney (AU)
Assigned to Dolby Laboratories Licensing Corporation, San Francisco, CA (US); and Dolby International AB, Dublin (IE)
Appl. No. 18/327,623
Filed by Dolby Laboratories Licensing Corporation, San Francisco, CA (US); and Dolby International AB, Dublin (IE)
PCT Filed Dec. 2, 2021, PCT No. PCT/US2021/061671
§ 371(c)(1), (2) Date Jun. 1, 2023,
PCT Pub. No. WO2022/120093, PCT Pub. Date Jun. 9, 2022.
Claims priority of provisional application 63/228,732, filed on Aug. 3, 2021.
Claims priority of provisional application 63/171,404, filed on Apr. 6, 2021.
Claims priority of provisional application 63/120,365, filed on Dec. 2, 2020.
Prior Publication US 2024/0135937 A1, Apr. 25, 2024
Int. Cl. G10L 19/008 (2013.01); G10L 19/083 (2013.01); H04S 7/00 (2006.01)
CPC G10L 19/008 (2013.01) [G10L 19/083 (2013.01); H04S 7/00 (2013.01); H04S 2400/03 (2013.01)] 1 Claim
OG exemplary drawing
 
1. An audio signal encoding method comprising:
obtaining, with at least one processor, an input audio signal, the input audio signal representing an input audio scene and comprising a primary input audio channel and side channels;
determining, with the at least one processor, a type of downmix coding scheme based on the input audio signal;
based on the type of downmix coding scheme:
computing, with the at least one processor, one or more input downmixing gains to be applied to the input audio signal to construct a primary downmix channel, wherein the input downmixing gains are determined to minimize an overall prediction error on the side channels;
determining, with the at least one processor, one or more downmix scaling gains to scale the primary downmix channel, wherein the downmix scaling gains are determined by minimizing an energy difference between a reconstructed representation of the input audio scene from the primary downmix channel and the input audio signal;
generating, with the at least one processor, prediction gains based on the input audio signal, the input downmixing gains and the downmix scaling gains;
determining, with the at least one processor, one or more residual channels from the side channels in the input audio signal by using the primary downmix channel and the prediction gains to generate side channel predictions and then subtracting the side channel predictions from the side channels;
determining, with the at least one processor, decorrelation gains based on energy in the residual channels;
encoding, with the at least one processor, the primary downmix channel, zero or more of the residual channels and side information into a bitstream, the side information comprising the prediction gains and the decorrelation gains; and
outputting, with the at least one processor, the bitstream.