| CPC G10L 13/027 (2013.01) [G10L 25/78 (2013.01); G10L 2025/786 (2013.01)] | 15 Claims |

|
1. A wearable electronic device comprising:
an electro-acoustic input transducer configured to:
pick up an acoustic signal, and
convert the acoustic signal to a microphone signals (x);
a loudspeaker; and
a processor comprising:
a voice activity detector configured to detect voice activity based on processing time-domain waveforms of the microphone signals (x) by:
generating frames comprising frequency-time representations of the waveforms of the microphone signal (x), and
detecting the voice activity only when voice activity is determined to be in a predefined number of consecutive frames;
a machine learning component configured to:
generate a voice activity signal (y), concurrently with the microphone signal (x), by indicating periods of time in which the microphone signal (x) includes signal components that represent detected voice activity and signal components that represent detected voice inactivity; and
wherein the generated voice activity signal (y) is sequentially indicative of at least one voice activity and voice inactivity;
control a volume of a masking signal (m) in response to the generated voice activity signal (y) to be at a first volume when the generated voice activity signal (y) is indicative of voice activity and to be at a second volume when the generated voice activity signal (y) is indicative of voice inactivity; and
supply the masking signal (m) to the loudspeaker.
|
|
11. A signal processing method at a wearable electronic device comprising:
an electro-acoustic input transducer arranged to pick up an acoustic signal and convert the acoustic signal to a microphone signal (x); a loudspeaker; and
a processor comprising a machine learning component that is configured to:
control a volume of a masking signal (m);
supply a masking signal (m) to the loudspeaker;
detect voice activity based on processing at least of time-domain waveforms of the microphone signal (x) that includes:
generating frames comprising frequency-time representations of the waveforms of the microphone signal (x), and
determining the voice activity only when voice activity is determined to be in a predefined number of consecutive frames; and
generate a voice activity signal (y) which is, concurrently with the microphone signal, sequentially indicative of one or more of voice activity and voice in-activity; and
control the volume of the masking signal (m) in accordance with a sound pressure level of the acoustic signal when supplying the masking signal (m) to the loudspeaker at a first volume at times when the voice activity signal (y) is indicative of voice activity and at a second volume at times when the voice activity signal (y) is indicative of voice in-activity.
|