| CPC G06F 40/58 (2020.01) [G10L 13/02 (2013.01); G10L 15/22 (2013.01); G10L 15/26 (2013.01); G10L 25/57 (2013.01)] | 18 Claims |

|
1. A system for providing real-time language interpretation in an audio or video communication session, comprising:
a communications platform for connecting first and second parties in an audio or video communication session, wherein the communications platform receives spoken audio input in a first language from the first party and spoken audio input from the second party;
a transcription module operatively connected to the communications platform that is configured to generate separate transcriptions of the spoken audio input provided by the first and second parties;
a translation module operatively connected to the transcription module and configured to receive a transcription of the first party's spoken audio input from the transcription module and to create a translated transcription of the first party's spoken audio input in a second language; and
a text-to-speech module operatively connected to the translation module that is configured to generate a spoken audio version of the first party's translated transcription in the second language;
wherein the communications platform is configured to provide to the second party, at approximately the same time, both the first party's spoken audio input in the first language and the spoken audio version of the first party's translated transcription in the second language; and
wherein when the second party begins to provide spoken audio input while the communication platform is still providing the first party's translated transcription in the second language to the second party, the communication platform stops providing the first party's translated transcription in the second language to the second party.
|