US 12,272,376 B2
	Dialog enhancement using adaptive smoothing which depends exponentially on a smoothing factor
Xuemei Yu, Beijing (CN)
Assigned to Dolby Laboratories Licensing Corporation, San Francisco, CA (US)
Appl. No. 17/638,839
Filed by Dolby Laboratories Licensing Corporation, San Francisco, CA (US)
PCT Filed Aug. 26, 2020, PCT No. PCT/US2020/048034 § 371(c)(1), (2) Date Feb. 27, 2022, PCT Pub. No. WO2021/041568, PCT Pub. Date Jun. 4, 2021.
Claims priority of provisional application 62/963,711, filed on Jan. 21, 2020.
Claims priority of provisional application 62/900,969, filed on Sep. 16, 2019.
Claims priority of application No. PCT/CN2019/102775 (WO), filed on Aug. 27, 2019.
Prior Publication US 2022/0277766 A1, Sep. 1, 2022
Int. Cl. G10L 25/81 (2013.01); G10L 21/0208 (2013.01); G10L 25/60 (2013.01); H04R 1/10 (2006.01); H04R 5/04 (2006.01); G10L 25/78 (2013.01)

CPC G10L 25/81 (2013.01) [G10L 21/0208 (2013.01); G10L 2025/783 (2013.01)]

14 Claims

1. A method of enhancing dialog intelligibility in an audio signal, comprising, for each frame of the audio signal:

determining, by a speech classifier, a speech confidence score that the frame includes speech content;

determining, by a music classifier, a music confidence score that the frame includes music correlated content; and

in response to said speech confidence score, applying, by a dialog enhancement module, a user selected gain of selected frequency bands of the audio signal to obtain a dialog enhanced audio signal, the method is characterized in that the user selected gain is smoothed by an adaptive smoothing algorithm, an impact of past frames in said smoothing algorithm being determined by a smoothing factor,

said smoothing factor being selected in response to said music confidence score, and having a relatively higher value for content having a relatively higher music confidence score and a relatively lower value for speech content having a relatively lower music confidence score, so as to increase the impact of past frames on the dialog enhancement of music correlated content,

wherein the adaptive smoothing algorithm is a weighted average according to:

Out(n)=αOut(n−1)+(1−α)In(n),

where Out(n) is a smoothed output gain of the current frame, Out(n−1) is a smoothed output gain of the previous frame, In(n) is an original input gain of the current frame, and a is an adaptively adjusted variable that varies exponentially as a function of the smoothing factor.