US 12,394,416 B2
Detecting near matches to a hotword or phrase
Matthew Sharifi, Kilchberg (CH); and Victor Carbune, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Oct. 27, 2023, as Appl. No. 18/384,764.
Application 18/384,764 is a continuation of application No. 17/081,645, filed on Oct. 27, 2020, granted, now 11,830,486.
Claims priority of provisional application 63/091,237, filed on Oct. 13, 2020.
Prior Publication US 2024/0055002 A1, Feb. 15, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/00 (2013.01); G10L 15/08 (2006.01); G10L 15/22 (2006.01); G10L 25/51 (2013.01)
CPC G10L 15/22 (2013.01) [G10L 15/08 (2013.01); G10L 25/51 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors, the method comprising:
receiving, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user;
processing the first audio data using one or more machine learning models to generate a first predicted output that indicates a probability of one or more hotwords being present in the first audio data;
determining that the first predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold;
receiving, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user;
processing the second audio data using the one or more machine learning models to generate a second predicted output that indicates a probability of the one or more hotwords being present in the second audio data;
determining that the second predicted output satisfies the secondary threshold but does not satisfy the primary threshold;
in response to the first predicted output and the second predicted output satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another, identifying a failed hotword attempt; and
in response to identifying the failed hotword attempt:
determining an intended hotword corresponding to the failed hotword attempt, wherein neither the intended hotword nor another supported hotword is included in the first spoken utterance and the second spoken utterance;
providing a hint, comprising displaying the intended hotword on a display of the client device or providing, by the client device, an audio response that includes the intended hotword; and
performing an action corresponding to the intended hotword.