US 11,694,685 B2
Hotphrase triggering based on a sequence of detections
Victor Carbune, Zürich (CH); and Matthew Sharifi, Kilchberg (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 10, 2020, as Appl. No. 17/118,251.
Prior Publication US 2022/0189469 A1, Jun. 16, 2022
Int. Cl. G10L 15/00 (2013.01); G10L 15/16 (2006.01); G10L 15/22 (2006.01); G06F 16/2452 (2019.01); G06F 1/3231 (2019.01); G10L 15/28 (2013.01); G10L 15/08 (2006.01)
CPC G10L 15/22 (2013.01) [G06F 1/3231 (2013.01); G06F 16/24522 (2019.01); G10L 15/16 (2013.01); G10L 15/285 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, at data processing hardware of a user device associated with a user, audio data corresponding to an utterance spoken by the user and captured by the user device, the utterance comprising:
a command for a digital assistant to perform an operation;
a hotphrase associated with a set of trigger words; and
one or more other words not associated with the hotphrase that are spoken between a first trigger word in the set of trigger words and a last trigger word in the set of trigger words;
during each of a plurality of fixed-duration time windows of the audio data:
determining, by the data processing hardware, using a hotphrase detector configured to detect each trigger word in the set of trigger words associated with the hotphrase, that the first trigger word in the set of trigger words is detected in the audio data during the corresponding fixed-duration time window;
after determining that the first trigger word in the set of trigger words is detected in the audio data during the corresponding fixed-duration time window, determining, by the data processing hardware, that each other trigger word in the set of trigger words associated with the hotphrase is also detected in the audio data during the corresponding fixed-duration time window;
determining that a sequence of the set of trigger words detected in the audio data during the corresponding fixed-duration time window matches a predefined sequential order associated with the hotphrase; and
based on determining that the sequence of the set of trigger words detected in the audio data during the corresponding fixed-duration time window matches the predefined sequential order associated with the hotphrase, identifying, by the data processing hardware, in the audio data corresponding to the utterance, the hotphrase; and
triggering, by the data processing hardware, an automated speech recognizer (ASR) to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.