US 12,217,751 B2
Digital signal processor-based continued conversation
Victor Carbune, Zürich (CH); and Matthew Sharifi, Kilchberg (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 15, 2021, as Appl. No. 17/644,394.
Claims priority of provisional application 63/262,447, filed on Oct. 13, 2021.
Prior Publication US 2023/0113883 A1, Apr. 13, 2023
Int. Cl. G10L 15/22 (2006.01); G06F 16/9032 (2019.01); G10L 15/26 (2006.01); G10L 25/78 (2013.01)
CPC G10L 15/22 (2013.01) [G06F 16/90332 (2019.01); G10L 15/26 (2013.01); G10L 25/78 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A computer-implemented method when executed on data processing hardware of an assistant-enabled device causes the data processing hardware to perform operations comprising:
in response to receiving a response to an initial query submitted by a user of the assistant-enabled device to a digital assistant, instructing an always-on first processor of the data processing hardware to operate in a follow-on query detection mode and an active second processor of the data processing hardware to return to a sleep state; and
while the always-on first processor operates in the follow-on query detection mode:
receiving, at the always-on first processor, follow-on audio data captured by the assistant-enabled device in an environment of the assistant-enabled device, the follow-on audio data not containing a hotword;
determining, using a voice activity detection (VAD) model executing on the always-on first processor, whether or not the VAD model detects voice activity in the follow-on audio data;
performing, using a speaker identification (SID) model executing on the always-on first processor, speaker verification on the follow-on audio data to determine whether the follow-on audio data comprises an utterance spoken by the same user that submitted the initial query to the digital assistant; and
when the VAD model detects voice activity in the follow-on audio data and the follow-on audio data comprises the utterance spoken by the same user that submitted the initial query, initiating a wake-up process on a second processor of the data processing hardware to determine whether the utterance comprises a follow-on query directed toward the digital assistant.