| CPC G10L 21/02 (2013.01) [G10K 11/178 (2013.01); G10L 21/0208 (2013.01); H04M 9/082 (2013.01); G10K 2210/3012 (2013.01); G10K 2210/3028 (2013.01); G10K 2210/505 (2013.01); G10L 2021/02087 (2013.01); G10L 21/0232 (2013.01); H04R 3/005 (2013.01); H04R 3/12 (2013.01); H04R 27/00 (2013.01); H04R 29/007 (2013.01); H04R 2227/003 (2013.01); H04R 2227/005 (2013.01); H04R 2420/03 (2013.01); H04R 2420/07 (2013.01); H04R 2430/23 (2013.01)] | 20 Claims |

|
1. A system comprising:
at least one playback device comprising at least one headphone, at least one microphone and one or more audio transducers;
a mobile device;
one or more wireless network interfaces;
at least one processor; and
data storage storing program instructions that are executable by the at least one processor to cause the system to perform functions comprising:
playing back at least one audio signal via the one or more audio transducers of the at least one playback device;
while playing back the at least one audio signal, capturing, via the at least one microphone, audio, wherein at least a portion of the captured audio represents sound produced by the one or more audio transducers in playing back the at least one audio signal;
transforming into a short time Fourier transform (STFT) domain the captured audio to generate a measured signal representing actual acoustic echo;
transforming into the STFT domain the at least one audio signal being played back by the at least one playback device to generate a reference signal;
during each nth iteration of an acoustic echo canceller (AEC):
determining an nth frame of an output signal, wherein determining the nth frame of the output signal comprises:
(i) generating an nth frame of a model signal representing estimated acoustic echo by passing an nth frame of the reference signal through an nth instance of an adaptive filter; and
(ii) generating the nth frame of the output signal by differencing the nth frame of the model signal and an nth frame of the measured signal;
determining a n+1th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1th instance of the adaptive filter for the next iteration of the AEC comprises:
(i) estimating an nth frame of an error signal, the nth frame of the error signal representing a difference between the nth frame of the measured signal and the nth frame of the model signal;
(ii) converting the nth frame of an error signal to an nth update filter;
(iii) deactivating inactive portions of the nth update filter, the inactive portions having less than a threshold energy; and
(iv) generating the n+1th instance of the adaptive filter for the next iteration of the AEC by summing the nth instance of the adaptive filter with the nth update filter; and
sending, via the one or more wireless network interfaces, the output signal as a voice input to one or more voice assistants for processing of the voice input.
|