US 11,854,547 B2
	Network microphone device with command keyword eventing
Connor Smith, New Hudson, MI (US); John Tolomei, Renton, WA (US); and Kurt Soto, Ventura, CA (US)
Assigned to Sonos, Inc., Santa Barbara, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on Dec. 13, 2021, as Appl. No. 17/549,034.
Application 17/549,034 is a continuation of application No. 16/439,032, filed on Jun. 12, 2019, granted, now 11,200,894.
Prior Publication US 2022/0277742 A1, Sep. 1, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G10L 15/30 (2013.01); G10L 15/18 (2013.01); G06F 3/16 (2006.01); G10L 15/08 (2006.01)

CPC G10L 15/22 (2013.01) [G06F 3/165 (2013.01); G06F 3/167 (2013.01); G10L 15/1815 (2013.01); G10L 15/30 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A playback device comprising:

a network interface;

at least one microphone configured to detect sound;

at least one speaker;

at least one processor; and

a housing carrying the network interface, the at least one microphone, the at least one speaker; the at least one processor, and data storage including instructions that are executable by the at least one processor such that the playback device is configured to:

capture, via the at least one microphone, at least one input data stream;

detect a wake word in a first portion of the at least one input data stream;

based on detection of the wake word, trigger a wake-word event based on a first voice input captured via the at least one microphone, wherein the first voice input comprises the wake word and an utterance, and wherein the wake word does not correspond to a command;

stream, via the network interface, sound data representing at least a portion of the first voice input to one or more remote servers of a voice assistant service for remote processing via a voice assistant of the one or more remote servers;

after the first voice input is processed, a first command keyword in a second portion of the at least one input data stream, wherein the first command keyword is preceded in the at least one input data stream by a period of inactivity that excludes the wake word;

based on detection of the first command keyword, trigger a first command keyword event to locally process a second voice input represented in the second portion of the at least one input data stream, wherein the second voice input comprises a first command keyword and at least one keyword from a set of keywords supported by a local voice assistant, wherein the first command keyword is one of a plurality of command keywords supported by the local voice assistant of the playback device, and wherein the second voice input excludes the wake word;

determine, via the local voice assistant, (i) a particular command corresponding to the first command keyword and (ii) one or parameters corresponding to the at least one keyword, the one or more parameters modifying the particular command; and

cause at least one local network device to carry out the particular command according to the one or more parameters.