US 11,705,103 B2
	Audio system and signal processing method of voice activity detection for an ear mountable playback device
Peter McCutcheon, Premstätten (AT); and Dylan Morgan, Premstätten (AT)
Assigned to AMS AG, Premstätten (AT)
Appl. No. 17/440,984
Filed by ams AG, Premstätten (AT)
PCT Filed Mar. 17, 2020, PCT No. PCT/EP2020/057286 § 371(c)(1), (2) Date Sep. 20, 2021, PCT Pub. No. WO2020/193286, PCT Pub. Date Oct. 1, 2020.
Claims priority of application No. 19164680 (EP), filed on Mar. 22, 2019; and application No. 19187045 (EP), filed on Jul. 18, 2019.
Prior Publication US 2022/0165245 A1, May 26, 2022
Int. Cl. G10K 11/178 (2006.01); G10L 25/78 (2013.01)

CPC G10K 11/1783 (2018.01) [G10K 11/17823 (2018.01); G10K 11/17825 (2018.01); G10K 11/17854 (2018.01); G10K 11/17881 (2018.01); G10L 25/78 (2013.01); G10K 2210/1081 (2013.01); G10K 2210/3026 (2013.01); G10K 2210/3027 (2013.01); G10K 2210/3028 (2013.01); G10K 2210/3044 (2013.01); G10L 2025/783 (2013.01)]

9 Claims

1. A signal processing method of voice activity detection for an ear mountable playback device comprising a speaker, an error microphone predominantly sensing sound being output from the speaker and also sensing ambient sound, and a feed-forward microphone predominantly sensing ambient sound, the method comprising the steps of:

using a voice activity detector:

recording a feed-forward signal from the feed-forward microphone,

recording an error signal from the error microphone,

determining at least one detection parameter as a function of the feed-forward signal and the error signal, and

monitoring the at least one detection parameter and setting a voice activity state depending on the at least one detection parameter; and further, using an adaptive noise cancellation controller coupled to the feed-forward microphone and to the error microphone:

performing noise cancellation processing depending on the feed-forward signal and/or the error signal, and by using a filter coupled to the feed-forward microphone and to the speaker, having a filter transfer function determined by the noise cancellation processing, wherein the detection parameter:

is based on a ratio of the feed-forward signal and the error signal,

the method comprising the further steps, using the voice activity detector:

monitoring a sound signal played from the device, and

determining one of the following voice activity states: false, true, or likely,

the voice activity state equals true indicates voice detected, and

the voice activity state equals false indicates voice not detected, the method comprising the further steps, using the voice activity detector:

controlling the adaptive noise cancellation controller depending on the voice activity state, the method being characterized by further comprising the steps of:

using the voice activity detector entering either a first mode of operation or a second mode of operation, respectively, when the detection parameter is larger than a first threshold or smaller than the first threshold,

in the first mode of operation, analyzing a phase difference between the feed-forward signal and the error signal and

setting the voice activity state depending on the analyzed phase difference, in the second mode of operation:

analyzing a level of tonality of the error signal and

setting the voice activity state depending on the analyzed level of tonality, the method comprising the further steps, using the voice activity detector:

determining whether or not the sound signal is active, and if the sound signal is active entering in a fourth mode of operation, wherein:

using the voice activity detector, the second mode operation is entered if the detection parameter is smaller than the first threshold, and

if the detection parameter exceeds the first threshold, a combined first and second mode of operation is entered, the combined first and second mode of operation comprising, using the voice activity detector, setting the voice activity state depending on both the analyzed phase difference and level of tonality.