US 12,080,290 B2
Determining dialog states for language models
Petar Aleksic, Jersey City, NJ (US); and Pedro Jose Moreno Mengibar, Jersey City, NJ (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 10, 2022, as Appl. No. 17/650,567.
Application 17/650,567 is a continuation of application No. 16/732,645, filed on Jan. 2, 2020, granted, now 11,264,028.
Application 16/732,645 is a continuation of application No. 15/983,768, filed on May 18, 2018, granted, now 10,553,214, issued on Feb. 4, 2020.
Prior Publication US 2022/0165270 A1, May 26, 2022
Int. Cl. G10L 15/22 (2006.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G10L 15/26 (2006.01); G10L 15/065 (2013.01); G10L 15/183 (2013.01); G10L 15/197 (2013.01)
CPC G10L 15/22 (2013.01) [G06F 40/295 (2020.01); G06F 40/30 (2020.01); G10L 15/26 (2013.01); G10L 15/065 (2013.01); G10L 15/183 (2013.01); G10L 15/197 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method that when executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving a transcription request for a voice input captured by a user device, the transcription request comprising audio data and context data, the audio data indicating the voice input and the context data indicating:
an application to which the voice input is directed; and
a particular stage from a multi-stage voice activity corresponding to a series of user interactions related to a task of the application;
based on the context data, determining, from among multiple possible dialogs, a particular dialog corresponding to the particular stage from the multi-stage voice activity;
based on the particular dialog corresponding to the particular stage from the multi-stage voice activity, filtering the audio data to only include audio data associated with the particular dialog;
identifying, from a set of n-grams, a representative subset of n-grams associated with the particular dialog corresponding to the particular stage from the multi-stage voice activity based on one or more prior stages from the multi-stage voice activity corresponding to one or more prior transcription requests, each n-gram of the set of n-grams comprising a respective non-zero probability score;
biasing a language model by increasing the respective non-zero probability score for each n-gram of the identified representative subset of n-grams associated with the particular dialog; and
processing, using the biased language model, the filtered audio data to generate a transcription of the voice input.