US 12,475,906 B2
Method and system for multiple time resolution audio processing
Johannes Traa, Medford, MA (US); Atulya Yellepeddi, Medford, MA (US); and Donald F. Porges, Watertown, MA (US)
Assigned to Analog Devices, Inc., Wilmington, MA (US)
Filed by Analog Devices, Inc., Wilmington, MA (US)
Filed on Aug. 16, 2023, as Appl. No. 18/450,784.
Prior Publication US 2025/0061911 A1, Feb. 20, 2025
Int. Cl. G10L 21/0224 (2013.01); G10L 21/0216 (2013.01)
CPC G10L 21/0224 (2013.01) [G10L 21/0216 (2013.01); G10L 2021/02166 (2013.01)] 23 Claims
OG exemplary drawing
 
1. A method for voice control, comprising:
transforming, using a short-time Fourier transform (STFT) applied to data in each of a plurality of windows aligned across each input channel of a multichannel audio stream, the multichannel audio stream into a complex valued frequency-domain representation,
wherein for a current one of the plurality of windows, the method comprises:
updating a first complex-valued covariance matrix corresponding to a slowly-adapting beamformer and forming a single-channel denoised estimate for each frequency band in the STFT;
calculating a voice activity detection (VAD) estimate for each frequency band in the STFT by comparing a magnitude of the single-channel denoised estimate to a magnitude of each input channel of the multichannel audio stream; and
selectively updating or refraining from updating, responsive to the VAD estimate respectively indicating a presence or an absence of speech, a second complex-valued covariance matrix corresponding to a quickly-adapting beamformer; and
controlling, by a hardware processor, a voice user interface based device to perform a user perceptible action, responsive to an output of at least the quickly-adapting beamformer.