US 12,322,376 B2
Personalized speech query endpointing based on prior interaction(s) involving a specific phrase preceding an endpoint
Matthew Sharifi, Kilchberg (CH); and Victor Carbune, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Jan. 20, 2022, as Appl. No. 17/579,949.
Prior Publication US 2023/0230578 A1, Jul. 20, 2023
Int. Cl. G10L 13/00 (2006.01); G06Q 10/087 (2023.01); G06Q 30/018 (2023.01); G06Q 30/02 (2023.01); G06Q 30/0207 (2023.01); G10L 15/05 (2013.01); G10L 15/16 (2006.01); G10L 15/22 (2006.01); G10L 15/26 (2006.01); G10L 25/75 (2013.01); H04M 1/72466 (2021.01); H04M 3/493 (2006.01); H04R 27/00 (2006.01); B65G 1/127 (2006.01)
CPC G10L 15/05 (2013.01) [G10L 13/00 (2013.01); G10L 15/22 (2013.01); G10L 15/26 (2013.01); B65G 1/127 (2013.01); G06Q 10/087 (2013.01); G06Q 30/0185 (2013.01); G06Q 30/02 (2013.01); G10L 2015/227 (2013.01); H04M 3/4938 (2013.01); H04R 27/00 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors, the method comprising:
processing an audio data stream capturing a spoken utterance of a user where the audio data stream is captured via one or more microphones of a client device;
detecting a candidate endpoint in the audio data stream; and
determining whether the candidate endpoint is an actual endpoint based on:
a text representation of a portion of the spoken utterance immediately preceding the candidate endpoint, and
a user-specific measure that is based on the text representation and one or more historical interactions with the user, where each of the one or more historical interactions is based on processing a previous audio data stream capturing the user speaking a previous instance of the spoken utterance, where the previous instance of the user speaking the spoken utterance captured in the previous audio data stream is the same spoken utterance spoken by the user captured in the audio data stream, where the historical interactions are specific to the text representation and the user, and where the historical interactions each indicate whether a previous instance of the text representation was a previous endpoint for the user.