| CPC G10L 15/01 (2013.01) [G10L 15/063 (2013.01)] | 20 Claims |

|
1. A method comprising:
training a transformer learning model on inputs comprising a sequence of audio tokens, the trained transformer learning model being executable by one or more processors of a computing system to output a sequence of feature representations; and
training a quality estimation learning model on inputs comprising the sequence of feature representations, the trained quality estimation learning model being executable by the one or more processors to output a probability of a word error rate value.
|