| CPC G10L 15/063 (2013.01) [G06F 40/284 (2020.01); G06F 40/58 (2020.01); G10L 15/005 (2013.01); G10L 15/16 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01)] | 20 Claims |

|
1. A method for implementing an end-to-end automatic speech translation (AST) model with a neural transducer, the method comprising:
accessing a training dataset comprising an audio dataset comprising spoken language utterances in a first language and a text dataset comprising transcription labels in a second language, the transcription labels corresponding to the spoken language utterances;
accessing an end-to-end AST model based on a neural transducer comprising at least an acoustic encoder which is configured to receive and encode audio data, a prediction network which is integrated in a parallel model architecture with the acoustic encoder in the end-to-end AST model and configured to predict a subsequent language token based on a previous transcription label output;
applying the training dataset to the end-to-end AST model;
generating a transcription in the second language of input audio data in the first language based on the trained end-to-end AST model; and
causing the acoustic encoder to learn a plurality of temporal processing paths.
|