US 12,217,765 B2
Robust short-time fourier transform acoustic echo cancellation during audio playback
Daniele Giacobello, Los Angeles, CA (US)
Assigned to Sonos, Inc., Goleta, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on May 5, 2023, as Appl. No. 18/313,013.
Application 18/313,013 is a continuation of application No. 17/327,911, filed on May 24, 2021, granted, now 11,646,045.
Application 17/327,911 is a continuation of application No. 16/600,644, filed on Oct. 14, 2019, granted, now 11,017,789, issued on May 25, 2021.
Application 16/600,644 is a continuation of application No. 15/717,621, filed on Sep. 27, 2017, granted, now 10,446,165, issued on Oct. 15, 2019.
Prior Publication US 2023/0395088 A1, Dec. 7, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/02 (2013.01); G10K 11/178 (2006.01); G10L 21/0208 (2013.01); G10L 21/0232 (2013.01); H04M 9/08 (2006.01); H04R 3/00 (2006.01); H04R 3/12 (2006.01); H04R 27/00 (2006.01); H04R 29/00 (2006.01)
CPC G10L 21/02 (2013.01) [G10K 11/178 (2013.01); G10L 21/0208 (2013.01); H04M 9/082 (2013.01); G10K 2210/3012 (2013.01); G10K 2210/3028 (2013.01); G10K 2210/505 (2013.01); G10L 2021/02087 (2013.01); G10L 21/0232 (2013.01); H04R 3/005 (2013.01); H04R 3/12 (2013.01); H04R 27/00 (2013.01); H04R 29/007 (2013.01); H04R 2227/003 (2013.01); H04R 2227/005 (2013.01); H04R 2420/03 (2013.01); H04R 2420/07 (2013.01); H04R 2430/23 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
at least one playback device comprising at least one headphone, at least one microphone and one or more audio transducers;
a mobile device;
one or more wireless network interfaces;
at least one processor; and
data storage storing program instructions that are executable by the at least one processor to cause the system to perform functions comprising:
playing back at least one audio signal via the one or more audio transducers of the at least one playback device;
while playing back the at least one audio signal, capturing, via the at least one microphone, audio, wherein at least a portion of the captured audio represents sound produced by the one or more audio transducers in playing back the at least one audio signal;
transforming into a short time Fourier transform (STFT) domain the captured audio to generate a measured signal representing actual acoustic echo;
transforming into the STFT domain the at least one audio signal being played back by the at least one playback device to generate a reference signal;
during each nth iteration of an acoustic echo canceller (AEC):
determining an nth frame of an output signal, wherein determining the nth frame of the output signal comprises:
(i) generating an nth frame of a model signal representing estimated acoustic echo by passing an nth frame of the reference signal through an nth instance of an adaptive filter; and
(ii) generating the nth frame of the output signal by differencing the nth frame of the model signal and an nth frame of the measured signal;
determining a n+1th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1th instance of the adaptive filter for the next iteration of the AEC comprises:
(i) estimating an nth frame of an error signal, the nth frame of the error signal representing a difference between the nth frame of the measured signal and the nth frame of the model signal;
(ii) converting the nth frame of an error signal to an nth update filter;
(iii) deactivating inactive portions of the nth update filter, the inactive portions having less than a threshold energy; and
(iv) generating the n+1th instance of the adaptive filter for the next iteration of the AEC by summing the nth instance of the adaptive filter with the nth update filter; and
sending, via the one or more wireless network interfaces, the output signal as a voice input to one or more voice assistants for processing of the voice input.