| CPC G10L 15/063 (2013.01) [G06N 3/08 (2013.01); G10L 15/02 (2013.01); G10L 15/10 (2013.01); G10L 15/16 (2013.01)] | 25 Claims |

|
1. A computer-implemented method for preparing training data for a speech recognition model, the method comprising:
obtaining a plurality of audio data sets, each audio data set having a different acoustic feature; and
training a recurrent neural network transducer speech recognition model by sorting sentences from the plurality of audio data sets so that similar sentences from different audio data sets are positioned closely as a primary constraint by utilizing a similarity-score dependent penalty imposed for composed dissimilar data based on distances between sentences on a word vector and at least two hyperparameters, while imposing a secondary constraint on audio length by comparing audio distances between the sentences from utterances extracted from the plurality of audio data sets.
|