| CPC G10L 15/16 (2013.01) [G10L 15/005 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01)] | 19 Claims |

|
1. A computer-implemented method for improved recognition of multiple languages in audio data, the method comprising:
training a multilingual neural network model on first input audio data, the multilingual neural network model including shared acoustic model layers and a single projection layer, and the first input audio data including speech in a primary language and a secondary language;
splitting the single projection layer of the multilingual neural network model to produce a split head multilingual neural network model;
training the split head multilingual neural network model on second input audio data, the second input audio data including speech in the primary language and the secondary language to generate a trained split head multilingual neural network model, the trained split head multilingual neural network model including shared acoustic model layers and a plurality of projection layers, each projection layer of the plurality of projection layers corresponding to a language that the trained split head multilingual neural network model recognizes;
receiving audio data, the audio data including speech in a plurality of languages in the audio data, the speech in the plurality of languages corresponding the language recognized by a projection layer of the plurality of projection layers of the trained split head multilingual neural network model; and
classifying one or more languages of the speech of the audio data using the trained split head multilingual neural network model.
|
|
9. A system for improved recognition of multiple languages in audio data, the system including:
a data storage device that stores instructions for improved recognition of multiple languages in audio data; and
a processor configured to execute the instructions to perform a method including: training a multilingual neural network model on first input audio data, the multilingual neural network model including shared acoustic model layers and a single projection layer, and the first input audio data including speech in a primary language and a secondary language;
splitting the single projection layer of the multilingual neural network model to produce a split head multilingual neural network model;
training the split head multilingual neural network model on second input audio data, the second input audio data including speech in the primary language and the secondary language to generate a trained split head multilingual neural network model, the trained split head multilingual neural network model including shared acoustic model layers and a plurality of projection layers, each projection layer of the plurality of projection layers corresponding to a language that the trained split head multilingual neural network model recognizes;
receiving audio data, the audio data including speech in a plurality of languages in the audio data, the speech in the plurality of languages corresponding the language recognized by a projection layer of the plurality of projection layers of the trained split head multilingual neural network model; and
classifying one or more languages of the speech of the audio data using the trained split head multilingual neural network model.
|