US 12,394,407 B2
System and method for training domain-specific speech recognition language models
Vahram Edward Sukyas, Los Angeles, CA (US); Jeffrey G. Hopkins, Lincoln, RI (US); Sebastien Paré, Mississauga (CA); Phani Kumar Srinivas Savatham, Telangana (IN); and Sateesh Puvvalla, Telangana (IN)
Assigned to VIQ Solutions Inc., Mississauga (CA)
Filed by VIQ Solutions Inc., Mississauga (CA)
Filed on May 23, 2023, as Appl. No. 18/322,051.
Prior Publication US 2024/0395243 A1, Nov. 28, 2024
Int. Cl. G10L 15/26 (2006.01); G10L 15/06 (2013.01)
CPC G10L 15/063 (2013.01) [G10L 15/26 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, at a computer system, a starting ASR (Automated Speech Recognition) neural network model for a language, the starting ASR neural network model configured to receive an audio file in the language and generate a starting transcript of the audio file;
receiving, at the computer system, a request to generate a specialized version of the starting ASR neural network model, the specialized version having training for a predetermined category of audio;
based on the request, executing an analysis of the audio file via at least one processor of the computer system, resulting in timestamps associated with each word in the audio file and a confidence of the each word;
executing, via the at least one processor, the starting ASR neural network model on the audio file using an ASR architecture, resulting in the starting transcript;
generating, via the at least one processor, training data based on:
the timestamps of each word;
the confidence of each word; and
the starting transcript; and
training, via the at least one processor, an updated ASR neural network model using the training data, resulting in the specialized version of the starting ASR neural network model.