US 12,307,213 B2
Automatic speech recognition systems and processes
Kshitiz Kumar, Redmond, WA (US); Jian Wu, Bellevue, WA (US); Bo Ren, Bellevue, WA (US); Tianyu Wu, Suzhou (CN); Fahimeh Bahmaninezhad, San Mateo, CA (US); Edward C. Lin, Beijing (CN); Xiaoyang Chen, Suzhou (CN); and Changliang Liu, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 9, 2022, as Appl. No. 17/836,390.
Prior Publication US 2023/0401392 A1, Dec. 14, 2023
Int. Cl. G06F 40/58 (2020.01); G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01)
CPC G06F 40/58 (2020.01) [G10L 15/005 (2013.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A data processing system comprising:
a processor; and
a machine-readable storage medium storing executable instructions that, when executed, cause the processor to perform operations comprising:
receiving speech data for a plurality of languages;
identifying and extracting graphemes from the speech data using a grapheme extraction engine;
normalizing the speech data using a normalizing engine that applies linguistic based rules for Latin-based languages to map the graphemes from the speech data to graphemes in a Latin-based language;
building a computer model using the normalized speech data;
fine-tuning the computer model using additional speech data; and
recognizing words in a target language using the fine-tuned computer model,
wherein the computer model is a Long Short-Term Memory model that has a top layer fine-tuned by the additional speech data.