| CPC G10L 15/26 (2013.01) [G10L 15/04 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] | 20 Claims |

|
1. A method of processing audio streams, the method comprising:
receiving a plurality of audio streams during a time window, each audio stream received from an independent source, including speech content, and having an audio stream duration;
selecting a subset of the plurality of audio streams based on the respective audio stream durations of the plurality of audio streams, the subset comprising N number of audio streams, wherein a combined duration of the N number of audio streams is below a threshold duration, and wherein the threshold duration is based on a transaction cost for speech-to-text conversion;
generating N−1 number of audio stream separators;
concatenating the N number of audio streams to generate a concatenated audio stream;
inserting an audio stream separator between every two adjacent audio streams of the concatenated audio stream to generate a single audio stream payload, each audio stream separator delineating a beginning of a next audio stream and an end of a preceding audio stream in the concatenated audio stream;
transmitting the single audio stream payload from a buffer for transcription of the speech content of the set of audio streams to text content, a transcription of each of the speech content of the set of audio streams; and
in response to transmitting the single audio stream payload for transcription, receiving a text file including the text content delineated with the audio stream separators.
|