US 12,118,273 B2
Local voice data processing
Sebastien Maury, Paris (FR); Joseph Dureau, Paris (FR); Thibaut Lorrain, Paris (FR); and Do Kyun Kim, Paris (FR)
Assigned to Sonos, Inc., Goleta, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on Jan. 13, 2023, as Appl. No. 18/154,228.
Application 18/154,228 is a continuation of application No. 17/163,506, filed on Jan. 31, 2021, granted, now 11,556,307.
Claims priority of provisional application 62/968,675, filed on Jan. 31, 2020.
Prior Publication US 2023/0289135 A1, Sep. 14, 2023
Int. Cl. G10L 15/22 (2006.01); G06F 3/16 (2006.01); G10L 17/00 (2013.01); G10L 17/02 (2013.01); G10L 17/06 (2013.01); G10L 15/08 (2006.01)
CPC G06F 3/167 (2013.01) [G10L 15/22 (2013.01); G10L 17/02 (2013.01); G10L 17/06 (2013.01); G10L 2015/088 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a microcontroller unit (MCU);
a network interface;
at least one processor; and
at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the system is configured to:
store, in data storage of the MCU, data representing local keywords corresponding to smart Internet-of-Things (IoT) commands;
capture an input sound data stream from one or more microphones;
transmit, via the network interface over a local area network to a hub device, data representing the input sound data stream for voice processing of one or more first voice inputs in the input sound data stream into respective commands by a voice assistant of the hub device, wherein the MCU and the hub device are connected to the local area network;
monitor the captured input sound data stream for the local keywords corresponding the smart IoT commands;
detect at least one keyword of the local keywords in a portion of the captured input sound data stream comprising a second voice input;
determine at least one particular smart IoT command corresponding to the at least one keyword;
send, via the network interface to the hub device, an indication that the MCU is processing the portion of the captured input sound data stream comprising the second voice input, wherein the hub device forgoes processing of the portion of the captured input sound data stream comprising the second voice input when the indication is received; and
cause a smart IoT device to carry out the at least one particular smart IoT command.
 
11. A microcontroller unit (MCU) comprising:
a network interface;
at least one processor; and
at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the MCU is configured to:
store, in data storage of the MCU, data representing local keywords corresponding to smart Internet-of-Things (IoT) commands;
capture an input sound data stream from one or more microphones;
transmit, via the network interface over a local area network to a hub device, data representing the input sound data stream for voice processing of one or more first voice inputs in the input sound data stream into respective commands by a voice assistant of the hub device, wherein the MCU and the hub device are connected to the local area network;
monitor the captured input sound data stream for the local keywords corresponding the smart IoT commands;
detect at least one keyword of the local keywords in a portion of the captured input sound data stream comprising a second voice input;
determine at least one particular smart IoT command corresponding to the at least one keyword;
send, via the network interface to the hub device, an indication that the MCU is processing the portion of the captured input sound data stream comprising the second voice input, wherein the hub device forgoes processing of the portion of the captured input sound data stream comprising the second voice input when the indication is received; and
cause a smart IoT device to carry out the at least one particular smart IoT command.
 
20. At least one non-transitory computer-readable medium comprising program instructions that are executable by at least one processor such that a microcontroller unit (MCU) is configured to:
store, in data storage of the MCU, data representing local keywords corresponding to smart Internet-of-Things (IoT) commands;
capture an input sound data stream from one or more microphones;
transmit, via a network interface over a local area network to a hub device, data representing the input sound data stream for voice processing of one or more first voice inputs in the input sound data stream into respective commands by a voice assistant of the hub device;
monitor the captured input sound data stream for the local keywords corresponding the smart IoT commands;
detect at least one keyword of the local keywords in a portion of the captured input sound data stream comprising a second voice input;
determine at least one particular smart IoT command corresponding to the at least one keyword;
send, via the network interface to the hub device, an indication that the MCU is processing the portion of the captured input sound data stream comprising the second voice input, wherein the hub device forgoes processing of the portion of the captured input sound data stream comprising the second voice input when the indication is received; and
cause a smart IoT device to carry out the at least one particular smart IoT command.