US 12,217,748 B2
	Systems and methods of multiple voice services
Klaus Hartung, Boston, MA (US); and Daniele Giacobello, Los Angeles, CA (US)
Assigned to Sonos, Inc., Goleta, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on Nov. 22, 2021, as Appl. No. 17/532,744.
Application 17/532,744 is a continuation of application No. 15/936,177, filed on Mar. 26, 2018, granted, now 11,183,181.
Claims priority of provisional application 62/477,403, filed on Mar. 27, 2017.
Prior Publication US 2022/0157307 A1, May 19, 2022
Int. Cl. G10L 15/22 (2006.01); G06F 3/16 (2006.01); G10L 15/08 (2006.01); G10L 15/14 (2006.01); G10L 15/30 (2013.01); G10L 15/32 (2013.01); G10L 25/51 (2013.01)

CPC G10L 15/22 (2013.01) [G06F 3/167 (2013.01); G10L 15/08 (2013.01); G10L 15/30 (2013.01); G10L 25/51 (2013.01); G10L 2015/088 (2013.01); G10L 15/14 (2013.01); G10L 2015/223 (2013.01); G10L 15/32 (2013.01)]

23 Claims

1. A network microphone device comprising:

at least one microphone;

a network interface;

at least one processor;

at least one non-transitory computer-readable medium; and

program instructions stored on the at least one non-transitory computer-readable medium that are executable by the at least one processor such that the network microphone device is configured to:

receive, at a first time via the at least one microphone, first voice data indicating a first voice input, wherein the first voice data includes a first portion representing an activation word corresponding to one of a plurality of voice services and a second portion representing a first voice command, wherein the plurality of voice services are externally registered to a media playback system associated with the networked microphone device;

identify, from among the plurality of voice services, a particular voice service to process the first voice input, wherein the identifying comprises determining a closest match of the first portion of the first voice data representing the activation word with corresponding activation word data stored in a recognition dataset on the network microphone device;

based on the determined closest match, select the particular voice service and forgo selection of another voice service for processing the first voice input;

transmit, via the network interface, at least the second portion of the first voice data representing the first voice command to the particular voice service;

receive, at a second time after the first time via the at least one microphone, second voice data indicating a second voice input, wherein at least a portion of the second voice data represents a second voice command different from the first voice command;

determine that the second voice data was received less than a threshold period of time after the first voice data was received;

based at least on (i) the selection of the particular voice service for processing the first voice input and (ii) the determination that the second voice data was received less than a threshold period of time after the first voice data was received, select a voice service from among the plurality of voice services for processing the second voice input; and

transmit, via the network interface, at least the portion of the second voice data representing the second voice command to the voice service selected for processing the second voice input.