US 12,149,897 B2
	Audio playback settings for voice interaction
Klaus Hartung, Santa Barbara, CA (US); and Romi Kadri, Cambridge, MA (US)
Assigned to Sonos, Inc., Goleta, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on May 1, 2023, as Appl. No. 18/309,939.
Application 18/309,939 is a continuation of application No. 16/806,747, filed on Mar. 2, 2020, granted, now 11,641,559.
Application 16/806,747 is a continuation of application No. 15/946,585, filed on Apr. 5, 2018, granted, now 10,582,322, issued on Mar. 3, 2020.
Application 15/946,585 is a continuation of application No. 15/277,810, filed on Sep. 27, 2016, granted, now 9,942,678, issued on Apr. 10, 2018.
Prior Publication US 2023/0379644 A1, Nov. 23, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. H04R 29/00 (2006.01); G06F 3/16 (2006.01); G10L 15/22 (2006.01); H03G 3/32 (2006.01); H03G 3/34 (2006.01); H04R 27/00 (2006.01); G10L 15/08 (2006.01); G10L 25/84 (2013.01)

CPC H04R 29/007 (2013.01) [G06F 3/165 (2013.01); G10L 15/22 (2013.01); H03G 3/32 (2013.01); H03G 3/342 (2013.01); H04R 27/00 (2013.01); G10L 2015/088 (2013.01); G10L 25/84 (2013.01); H04R 2227/003 (2013.01); H04R 2227/005 (2013.01); H04R 2420/07 (2013.01); H04R 2430/01 (2013.01)]

20 Claims

1. A network microphone device comprising:

a network interface;

one or more microphones;

one or more audio transducers;

at least one processor;

a housing carrying at least the network interface, the one or more microphones, and the one or more audio transducers, and

at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the network microphone device is configured to:

while at least one playback device is playing back first audio in a given environment:

(a) record, via the one or more microphones, audio into a buffer;

(b) detect, within the recorded audio, a wake word to invoke a voice assistant;

(c) in response to detection of the wake word: (i) cause, via the network interface, the at least one playback device to duck a first portion of the first audio while recording, into the buffer, audio representing a voice input to the voice assistant and (ii) send, to the voice assistant, the recorded audio in the buffer representing the voice input to the voice assistant; and

(d) receive, from the voice assistant in response to the voice input, second audio representing a spoken response to the voice input; and

in response to receipt of the second audio representing the spoken response to the voice input: (1) cause, via the network interface, the at least one playback device to duck a second portion of the first audio and (2) play back the received second audio via the one or more audio transducers concurrently with playback of the ducked second portion of the first audio by the at least one playback device.