US 12,217,765 B2
	Robust short-time fourier transform acoustic echo cancellation during audio playback
Daniele Giacobello, Los Angeles, CA (US)
Assigned to Sonos, Inc., Goleta, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on May 5, 2023, as Appl. No. 18/313,013.
Application 18/313,013 is a continuation of application No. 17/327,911, filed on May 24, 2021, granted, now 11,646,045.
Application 17/327,911 is a continuation of application No. 16/600,644, filed on Oct. 14, 2019, granted, now 11,017,789, issued on May 25, 2021.
Application 16/600,644 is a continuation of application No. 15/717,621, filed on Sep. 27, 2017, granted, now 10,446,165, issued on Oct. 15, 2019.
Prior Publication US 2023/0395088 A1, Dec. 7, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/02 (2013.01); G10K 11/178 (2006.01); G10L 21/0208 (2013.01); G10L 21/0232 (2013.01); H04M 9/08 (2006.01); H04R 3/00 (2006.01); H04R 3/12 (2006.01); H04R 27/00 (2006.01); H04R 29/00 (2006.01)

CPC G10L 21/02 (2013.01) [G10K 11/178 (2013.01); G10L 21/0208 (2013.01); H04M 9/082 (2013.01); G10K 2210/3012 (2013.01); G10K 2210/3028 (2013.01); G10K 2210/505 (2013.01); G10L 2021/02087 (2013.01); G10L 21/0232 (2013.01); H04R 3/005 (2013.01); H04R 3/12 (2013.01); H04R 27/00 (2013.01); H04R 29/007 (2013.01); H04R 2227/003 (2013.01); H04R 2227/005 (2013.01); H04R 2420/03 (2013.01); H04R 2420/07 (2013.01); H04R 2430/23 (2013.01)]

20 Claims

1. A system comprising:

at least one playback device comprising at least one headphone, at least one microphone and one or more audio transducers;

a mobile device;

one or more wireless network interfaces;

at least one processor; and

data storage storing program instructions that are executable by the at least one processor to cause the system to perform functions comprising:

playing back at least one audio signal via the one or more audio transducers of the at least one playback device;

while playing back the at least one audio signal, capturing, via the at least one microphone, audio, wherein at least a portion of the captured audio represents sound produced by the one or more audio transducers in playing back the at least one audio signal;

transforming into a short time Fourier transform (STFT) domain the captured audio to generate a measured signal representing actual acoustic echo;

transforming into the STFT domain the at least one audio signal being played back by the at least one playback device to generate a reference signal;

during each n^thiteration of an acoustic echo canceller (AEC):

determining an n^thframe of an output signal, wherein determining the n^thframe of the output signal comprises:

(i) generating an n^thframe of a model signal representing estimated acoustic echo by passing an n^thframe of the reference signal through an n^thinstance of an adaptive filter; and

(ii) generating the n^thframe of the output signal by differencing the n^thframe of the model signal and an n^thframe of the measured signal;

determining a n+1^thinstance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1^thinstance of the adaptive filter for the next iteration of the AEC comprises:

(i) estimating an n^thframe of an error signal, the n^thframe of the error signal representing a difference between the n^thframe of the measured signal and the n^thframe of the model signal;

(ii) converting the n^thframe of an error signal to an n^thupdate filter;

(iii) deactivating inactive portions of the n^thupdate filter, the inactive portions having less than a threshold energy; and

(iv) generating the n+1^thinstance of the adaptive filter for the next iteration of the AEC by summing the n^thinstance of the adaptive filter with the n^thupdate filter; and

sending, via the one or more wireless network interfaces, the output signal as a voice input to one or more voice assistants for processing of the voice input.