| CPC G06F 40/58 (2020.01) [G10L 15/005 (2013.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01)] | 14 Claims |

|
1. A data processing system comprising:
a processor; and
a machine-readable storage medium storing executable instructions that, when executed, cause the processor to perform operations comprising:
receiving speech data for a plurality of languages;
identifying and extracting graphemes from the speech data using a grapheme extraction engine;
normalizing the speech data using a normalizing engine that applies linguistic based rules for Latin-based languages to map the graphemes from the speech data to graphemes in a Latin-based language;
building a computer model using the normalized speech data;
fine-tuning the computer model using additional speech data; and
recognizing words in a target language using the fine-tuned computer model,
wherein the computer model is a Long Short-Term Memory model that has a top layer fine-tuned by the additional speech data.
|