| CPC G10L 21/10 (2013.01) [G06F 3/165 (2013.01); G10L 17/00 (2013.01); G10L 21/0232 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01); G10L 2021/02082 (2013.01)] | 19 Claims |

|
5. A computer-implemented method comprising:
as implemented by one or more computing devices configured with specific executable instructions,
receiving speaker identification information for each of a plurality of audio channels, wherein the speaker identification information for each individual audio channel identifies a person assigned to the individual audio channel, wherein each of the plurality of audio channels are associated with a different microphone of a plurality of microphones;
receiving multi-channel streaming audio on the plurality of audio channels, wherein the multi-channel streaming audio comprises audio signals captured from each of the plurality of microphones on a different channel of the plurality of audio channels, wherein streaming audio of a first channel of the plurality of audio channels includes speech utterances spoken by two or more people;
obtaining a plurality of filtered audio channels that are generated based on the multi-channel streaming audio, such that each of the filtered audio channels includes speech utterances spoken by a different person assigned to the individual corresponding audio channel;
obtaining text results for each of the filtered audio channels by providing audio from each of the filtered audio channels to one or more speech-to-text services or modules, wherein the text results for each of the filtered audio channels represent words spoken by a different speaker; and
generating a transcript of at least a portion of multi-speaker conversation based on the text results, wherein the transcript includes identification of a respective speaker each time that a change in speaker occurs in the transcript.
|