US 12,437,756 B2
Cross-lingual speech recognition
Petar Aleksic, Jersey City, NJ (US); and Pedro J. Moreno Mengibar, Jersey City, NJ (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Aug. 3, 2022, as Appl. No. 17/817,176.
Application 17/817,176 is a continuation of application No. 16/593,564, filed on Oct. 4, 2019, granted, now 11,437,025.
Claims priority of provisional application 62/741,250, filed on Oct. 4, 2018.
Prior Publication US 2022/0383862 A1, Dec. 1, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/18 (2013.01); G10L 15/02 (2006.01); G10L 15/187 (2013.01); G10L 15/22 (2006.01)
CPC G10L 15/187 (2013.01) [G10L 15/02 (2013.01); G10L 15/22 (2013.01); G10L 2015/025 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising:
determining a context of a computing device, the computing device comprising a lexicon including multiple terms in a first language and a pronunciation for each of the multiple terms in the first language;
based on the context of the computing device:
identifying, for inclusion in the lexicon, one or more additional terms and a pronunciation for each of the one or more additional terms, the one or more additional terms in a second language different from the first language, wherein each respective term of the multiple terms in the first language and each respective term of the one or more additional terms in the second language comprises a corresponding likelihood score; and
for each respective term of the one or more additional terms in the second language, biasing the corresponding likelihood score based on the context of the computing device;
receiving audio data of an utterance comprising at least one word in the first language and at least one word in the second language;
based on the corresponding likelihood scores of the multiple terms in the first language and the biased corresponding likelihood scores of the one or more additional terms in the second language, generating, by performing speech recognition on the received audio data of the utterance using the lexicon, a transcription of the utterance, the transcription comprising the at least one word in the first language and the at least one word in the second language; and
providing, for output, the transcription of the utterance.