US 11,984,123 B2
	Network device interaction by range
Sebastien Maury, Paris (FR); Valentin Fage, Paris (FR); Do Kyun Kim, Paris (FR); Daniel Fernandez Castro, Paris (FR); Bjay Watanabe Kamwa, Paris (FR); and Joseph Dureau, Paris (FR)
Assigned to Sonos, Inc., Santa Barbara, CA (US)
Filed by Sonos, Inc., Santa Barbara, CA (US)
Filed on Nov. 11, 2021, as Appl. No. 17/524,377.
Claims priority of provisional application 63/112,756, filed on Nov. 12, 2020.
Prior Publication US 2022/0148592 A1, May 12, 2022
Int. Cl. G10L 15/22 (2006.01)

CPC G10L 15/22 (2013.01) [G10L 2015/225 (2013.01)]

20 Claims

1. A playback device comprising:

a network interface;

one or more processors;

at least one microphone;

at least one speaker;

at least one touch-sensitive sensor; and

data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions comprising:

monitoring for (i) user proximity in a first range from the playback device via the at least one touch-sensitive sensor and (ii) user line-of-sight in a second range that is further from the playback device than the first range;

enabling a wakewordless mode when at least one of (i) a touch input is detected via the at least one touch sensor or (ii) a user line-of-sight is detected, wherein the wakewordless mode is otherwise disabled;

while operating in the wakewordless mode:

monitoring a sound data stream from the at least one microphone for a plurality of command keywords corresponding to respective functions, wherein voice inputs including one or more of the plurality of command keywords are processable locally on the playback device;

detecting a first voice input in the monitored sound data stream; and

locally processing the first voice input, wherein local processing of the first voice input comprises determining that the detected first voice input includes one or more particular command keywords from among the plurality of command keywords corresponding to respective functions, wherein the first voice input excludes wake words corresponding to any voice assistance service; and

while the wakewordless mode is disabled:

monitoring, via the at least one microphone, the sound data stream for a wake word corresponding to a particular voice assistance service;

detecting, in the monitored sound data stream, a second voice input that includes the wake word corresponding to the particular voice assistance service; and

after detecting the second voice input that includes the wake word corresponding to the particular voice assistance service, processing the second voice input remotely via the particular voice assistance service.

11. A system comprising a first playback device and a second playback device, wherein the first playback device comprises:

a network interface;

one or more processors;

at least one microphone;

at least one speaker;

at least one touch-sensitive sensor;

data storage having instructions stored thereon that are executable by the one or more processors to cause the first playback device to perform functions comprising:

monitoring for (i) user proximity in a first range from the first playback device via the at least one touch-sensitive sensor, (ii) user line-of-sight in a second range that is further from the first playback device than the first range, and (iii) a particular indication that the second playback device detected a user line-of-sight to the second playback device;

enabling a wakewordless mode when at least one of (i) a touch input is detected via the at least one touch sensor, (ii) a user line-of-sight is detected, or (iii) the particular indication is received, wherein the wakewordless mode is otherwise disabled;

while operating in the wakewordless mode:

detecting a first voice input in the monitored sound data stream; and

while the wakewordless mode is disabled:

monitoring, via the at least one microphone, the sound data stream for a wake word corresponding to a particular voice assistance service;

detecting, in the monitored sound data stream, a second voice input that includes the wake word corresponding to the particular voice assistance service; and

20. A method to be performed by a playback device comprising a network interface, at least one touch-sensitive sensor and at least one microphone, the method comprising:

while operating in the wakewordless mode:

detecting a first voice input in the monitored sound data stream; and

while the wakewordless mode is disabled:

monitoring, via the at least one microphone, the sound data stream for a wake word corresponding to a particular voice assistance service;

detecting, in the monitored sound data stream, a second voice input that includes the wake word corresponding to the particular voice assistance service; and

after detecting the second voice input that includes the wake word corresponding to the voice assistance service, processing the second voice input remotely via the particular voice assistance service.