US 12,437,745 B2
	Wearable electronic device for emitting a masking signal
Clément Laroche, Ballerup (DK); and Rasmus Kongsgaard Olsson, Ballerup (DK)
Assigned to GN AUDIO A/S, Ballerup (DK)
Filed by GN AUDIO A/S, Ballerup (DK)
Filed on Sep. 30, 2020, as Appl. No. 17/038,953.
Claims priority of application No. 19201470 (EP), filed on Oct. 4, 2019.
Prior Publication US 2021/0104222 A1, Apr. 8, 2021
Int. Cl. G10L 25/00 (2013.01); G10L 13/027 (2013.01); G10L 25/78 (2013.01)

CPC G10L 13/027 (2013.01) [G10L 25/78 (2013.01); G10L 2025/786 (2013.01)]

15 Claims

1. A wearable electronic device comprising:

an electro-acoustic input transducer configured to:

pick up an acoustic signal, and

convert the acoustic signal to a microphone signals (x);

a loudspeaker; and

a processor comprising:

a voice activity detector configured to detect voice activity based on processing time-domain waveforms of the microphone signals (x) by:

generating frames comprising frequency-time representations of the waveforms of the microphone signal (x), and

detecting the voice activity only when voice activity is determined to be in a predefined number of consecutive frames;

a machine learning component configured to:

generate a voice activity signal (y), concurrently with the microphone signal (x), by indicating periods of time in which the microphone signal (x) includes signal components that represent detected voice activity and signal components that represent detected voice inactivity; and

wherein the generated voice activity signal (y) is sequentially indicative of at least one voice activity and voice inactivity;

control a volume of a masking signal (m) in response to the generated voice activity signal (y) to be at a first volume when the generated voice activity signal (y) is indicative of voice activity and to be at a second volume when the generated voice activity signal (y) is indicative of voice inactivity; and

supply the masking signal (m) to the loudspeaker.

11. A signal processing method at a wearable electronic device comprising:

an electro-acoustic input transducer arranged to pick up an acoustic signal and convert the acoustic signal to a microphone signal (x); a loudspeaker; and

a processor comprising a machine learning component that is configured to:

control a volume of a masking signal (m);

supply a masking signal (m) to the loudspeaker;

detect voice activity based on processing at least of time-domain waveforms of the microphone signal (x) that includes:

generating frames comprising frequency-time representations of the waveforms of the microphone signal (x), and

determining the voice activity only when voice activity is determined to be in a predefined number of consecutive frames; and

generate a voice activity signal (y) which is, concurrently with the microphone signal, sequentially indicative of one or more of voice activity and voice in-activity; and

control the volume of the masking signal (m) in accordance with a sound pressure level of the acoustic signal when supplying the masking signal (m) to the loudspeaker at a first volume at times when the voice activity signal (y) is indicative of voice activity and at a second volume at times when the voice activity signal (y) is indicative of voice in-activity.