US 12,354,617 B2
Context-aware voice intelligibility enhancement
Daekyoung Noh, Calabasas, CA (US); Pavel Chubarev, Calabasas, CA (US); and Xiaoyu Guo, Calabasas, CA (US)
Assigned to DTS, Inc., Calabasas, CA (US)
Filed by DTS, Inc., Calabasas, CA (US)
Filed on Feb. 11, 2022, as Appl. No. 17/669,615.
Application 17/669,615 is a continuation of application No. PCT/US2020/049933, filed on Sep. 9, 2020.
Claims priority of provisional application 62/898,977, filed on Sep. 11, 2019.
Prior Publication US 2022/0165287 A1, May 26, 2022
Int. Cl. G10L 21/0232 (2013.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 21/0208 (2013.01); G10L 21/0216 (2013.01); G10L 21/038 (2013.01)
CPC G10L 21/0232 (2013.01) [G10L 15/18 (2013.01); G10L 15/22 (2013.01); G10L 21/038 (2013.01); G10L 2021/02082 (2013.01); G10L 2021/02163 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
detecting noise in an environment with a microphone to produce a noise signal;
receiving a voice signal to be played into the environment through a loudspeaker;
determining a frequency analysis region for a multiband voice intelligibility computation based on a relationship between a microphone transfer function of the microphone and a loudspeaker transfer function of the loudspeaker;
performing multiband correction of the noise signal based on the microphone transfer function, to produce a corrected noise signal;
performing multiband correction of the voice signal based on the loudspeaker transfer function, to produce a corrected voice signal;
computing a global speech-to-noise ratio of (i) voice power based on the voice signal across a voice analysis bands limited to an overlap passband to (ii) noise power based on the noise signal across a microphone passband; and
computing multiband voice intelligibility results over the frequency analysis region based on the corrected noise signal and the corrected voice signal, wherein the multiband voice intelligibility results include long segments analyzed by a long-term voice and noise profiling obtained based on an accumulation of short-term voice intelligibility results over time with a sliding window.