US 12,387,741 B2
	Automated transcript generation from multi-channel audio
Anthony Donofrio, Woodcliff Lake, NJ (US); David Joseph DaSilva, Haskell, NJ (US); James Andrew Maraska, Jr., Morristown, NJ (US); and Jonathan Mordecai Kaplan, West Orange, NJ (US)
Assigned to Veritext, LLC, Livingston, NJ (US)
Filed by Veritext, LLC, Livingston, NJ (US)
Filed on Jun. 2, 2023, as Appl. No. 18/328,620.
Application 18/328,620 is a continuation of application No. 17/175,246, filed on Feb. 12, 2021, granted, now 11,699,456.
Application 17/175,246 is a continuation of application No. 16/179,087, filed on Nov. 2, 2018, granted, now 10,930,300, issued on Feb. 23, 2021.
Prior Publication US 2024/0153523 A1, May 9, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/10 (2013.01); G06F 3/16 (2006.01); G10L 17/00 (2013.01); G10L 21/0232 (2013.01); H04R 1/40 (2006.01); H04R 3/00 (2006.01); G10L 21/0208 (2013.01)

CPC G10L 21/10 (2013.01) [G06F 3/165 (2013.01); G10L 17/00 (2013.01); G10L 21/0232 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01); G10L 2021/02082 (2013.01)]

19 Claims

5. A computer-implemented method comprising:

as implemented by one or more computing devices configured with specific executable instructions,

receiving speaker identification information for each of a plurality of audio channels, wherein the speaker identification information for each individual audio channel identifies a person assigned to the individual audio channel, wherein each of the plurality of audio channels are associated with a different microphone of a plurality of microphones;

receiving multi-channel streaming audio on the plurality of audio channels, wherein the multi-channel streaming audio comprises audio signals captured from each of the plurality of microphones on a different channel of the plurality of audio channels, wherein streaming audio of a first channel of the plurality of audio channels includes speech utterances spoken by two or more people;

obtaining a plurality of filtered audio channels that are generated based on the multi-channel streaming audio, such that each of the filtered audio channels includes speech utterances spoken by a different person assigned to the individual corresponding audio channel;

obtaining text results for each of the filtered audio channels by providing audio from each of the filtered audio channels to one or more speech-to-text services or modules, wherein the text results for each of the filtered audio channels represent words spoken by a different speaker; and

generating a transcript of at least a portion of multi-speaker conversation based on the text results, wherein the transcript includes identification of a respective speaker each time that a change in speaker occurs in the transcript.