| CPC G10L 15/063 (2013.01) [G10L 15/26 (2013.01)] | 20 Claims |

|
1. A method comprising:
receiving, at a computer system, a starting ASR (Automated Speech Recognition) neural network model for a language, the starting ASR neural network model configured to receive an audio file in the language and generate a starting transcript of the audio file;
receiving, at the computer system, a request to generate a specialized version of the starting ASR neural network model, the specialized version having training for a predetermined category of audio;
based on the request, executing an analysis of the audio file via at least one processor of the computer system, resulting in timestamps associated with each word in the audio file and a confidence of the each word;
executing, via the at least one processor, the starting ASR neural network model on the audio file using an ASR architecture, resulting in the starting transcript;
generating, via the at least one processor, training data based on:
the timestamps of each word;
the confidence of each word; and
the starting transcript; and
training, via the at least one processor, an updated ASR neural network model using the training data, resulting in the specialized version of the starting ASR neural network model.
|