US 12,437,763 B2
Methods to employ concatenation in ASR service usage based on speech-to-text cost
Ankur Anil Aher, Maharashtra (IN); and Jeffry Copps Robert Jose, Tamil Nadu (IN)
Assigned to Adeia Guides Inc., San Jose, CA (US)
Appl. No. 17/623,372
Filed by Adeia Guides Inc., San Jose, CA (US)
PCT Filed Dec. 20, 2019, PCT No. PCT/IB2019/001348
§ 371(c)(1), (2) Date Dec. 28, 2021,
PCT Pub. No. WO2021/123853, PCT Pub. Date Jun. 24, 2021.
Prior Publication US 2022/0366912 A1, Nov. 17, 2022
Int. Cl. G10L 15/26 (2006.01); G06F 16/60 (2019.01); G06Q 10/10 (2023.01); G10L 13/033 (2013.01); G10L 15/04 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01); G10L 15/32 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 15/04 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method of processing audio streams, the method comprising:
receiving a plurality of audio streams during a time window, each audio stream received from an independent source, including speech content, and having an audio stream duration;
selecting a subset of the plurality of audio streams based on the respective audio stream durations of the plurality of audio streams, the subset comprising N number of audio streams, wherein a combined duration of the N number of audio streams is below a threshold duration, and wherein the threshold duration is based on a transaction cost for speech-to-text conversion;
generating N−1 number of audio stream separators;
concatenating the N number of audio streams to generate a concatenated audio stream;
inserting an audio stream separator between every two adjacent audio streams of the concatenated audio stream to generate a single audio stream payload, each audio stream separator delineating a beginning of a next audio stream and an end of a preceding audio stream in the concatenated audio stream;
transmitting the single audio stream payload from a buffer for transcription of the speech content of the set of audio streams to text content, a transcription of each of the speech content of the set of audio streams; and
in response to transmitting the single audio stream payload for transcription, receiving a text file including the text content delineated with the audio stream separators.