| CPC H04R 29/007 (2013.01) [G06F 3/165 (2013.01); G10L 15/22 (2013.01); H03G 3/32 (2013.01); H03G 3/342 (2013.01); H04R 27/00 (2013.01); G10L 2015/088 (2013.01); G10L 25/84 (2013.01); H04R 2227/003 (2013.01); H04R 2227/005 (2013.01); H04R 2420/07 (2013.01); H04R 2430/01 (2013.01)] | 20 Claims |

|
1. A network microphone device comprising:
a network interface;
one or more microphones;
one or more audio transducers;
at least one processor;
a housing carrying at least the network interface, the one or more microphones, and the one or more audio transducers, and
at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the network microphone device is configured to:
while at least one playback device is playing back first audio in a given environment:
(a) record, via the one or more microphones, audio into a buffer;
(b) detect, within the recorded audio, a wake word to invoke a voice assistant;
(c) in response to detection of the wake word: (i) cause, via the network interface, the at least one playback device to duck a first portion of the first audio while recording, into the buffer, audio representing a voice input to the voice assistant and (ii) send, to the voice assistant, the recorded audio in the buffer representing the voice input to the voice assistant; and
(d) receive, from the voice assistant in response to the voice input, second audio representing a spoken response to the voice input; and
in response to receipt of the second audio representing the spoken response to the voice input: (1) cause, via the network interface, the at least one playback device to duck a second portion of the first audio and (2) play back the received second audio via the one or more audio transducers concurrently with playback of the ducked second portion of the first audio by the at least one playback device.
|