US 12,165,641 B2
History-based ASR mistake corrections
Patrick Siegler, Zurich (CH); Aurélien Boffy, Basel (CH); and Ágoston Weisz, Zurich (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jul. 11, 2022, as Appl. No. 17/811,650.
Prior Publication US 2024/0013782 A1, Jan. 11, 2024
Int. Cl. G10L 15/22 (2006.01); G10L 15/08 (2006.01)
CPC G10L 15/22 (2013.01) [G10L 15/08 (2013.01); G10L 2015/088 (2013.01)] 28 Claims
OG exemplary drawing
 
1. A computer-implemented method executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving initial audio data captured by an assistant-enabled device while the assistant-enabled device is in a sleep state, the initial audio data comprising a hotword and a previous query submitted by a user of the assistant-enabled device to the digital assistant, the hotword when detected by the assistant-enabled device causing the assistant-enabled device to wake from a sleep state and trigger a speech recognizer to perform speech recognition on at least a portion of the initial audio data that includes the previous query;
after the speech recognizer performs speech recognition on at least the portion of the initial audio data, instructing the assistant-enabled device to operate in a follow-on query mode
receiving follow-on audio data captured by the assistant-enabled device during operation of the assistant-enabled device in the follow-on query mode, the follow-on audio data corresponding to a follow-on query spoken by the user of the assistant-enabled device to the digital assistant subsequent to the user submitting the previous query to the digital assistant;
processing, using the speech recognizer, the follow-on audio data to generate multiple candidate hypotheses, each candidate hypothesis corresponding to a candidate transcription for the follow-on query and represented by a respective sequence of hypothesized terms;
for each corresponding candidate hypothesis among the multiple candidate hypotheses:
determining a corresponding similarity metric between the previous query and the corresponding candidate hypothesis, the similarity metric indicating a similarity between a topic associated with the corresponding candidate hypothesis and a topic associated with the previous query; and
obtaining a corresponding likelihood score that the speech recognizer assigned to the corresponding candidate hypothesis;
ranking the multiple candidate hypotheses based on the corresponding likelihood scores assigned to the multiple candidate hypotheses by the speech recognizer and the corresponding similarity metrics determined for each of the corresponding candidate hypotheses among the multiple candidate hypotheses; and
determining a transcription of the follow-on query spoken by the user based on the ranking of the multiple candidate hypotheses.