CPC G10L 15/187 (2013.01) [G06N 20/00 (2019.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 2015/025 (2013.01)] | 20 Claims |
1. A computing system comprising:
one or more processors; and
one or more computer-readable instructions that are executable by the one or more processors to configure the computing system to at least:
obtain a first training data set comprising labeled speech data or both labeled and unlabeled data corresponding to a high-resource data set, as well as latent speech representations based on the first training data set;
train a machine learning model on the first training data set to learn phonetically aware speech representations corresponding to the first training data set;
apply the latent speech representations from the machine learning model to a transformer context network to generate contextual representations;
align each contextual representation included in the contextual representations with a phoneme label to generate phonetically-aware contextual representations; and
cause a refinement engine to further refine the machine learning model based on a refinement dataset, wherein the refinement engine fine-tunes the machine learning model on a limited labeled dataset corresponding to a low-resource target language and/or target domain; and
transform at least some of the contextual representations by randomly replacing a sub-set of the contextual representations with quantized latent speech representations.
|