US 12,413,904 B2
	Multi-source audio processing systems and methods
Christopher Charles Nighman, Broomfield, CO (US); Gerrit Eimbertus Rosenboom, Longmont, CO (US); Alfredo Martin Aguilar, Chula Vista, CA (US); and Matthew George Skogmo, Placentia, CA (US)
Assigned to QSC, LLC, Costa Mesa, CA (US)
Filed by QSC, LLC, Costa Mesa, CA (US)
Filed on Oct. 11, 2022, as Appl. No. 17/963,862.
Claims priority of provisional application 63/254,901, filed on Oct. 12, 2021.
Prior Publication US 2023/0115674 A1, Apr. 13, 2023
Int. Cl. H04R 1/40 (2006.01); G10L 21/0208 (2013.01); G10L 21/0272 (2013.01); G10L 21/0316 (2013.01); G10L 25/78 (2013.01); H04B 17/318 (2015.01); H04M 3/56 (2006.01); H04M 7/00 (2006.01); H04M 9/08 (2006.01); H04N 7/15 (2006.01); H04R 3/00 (2006.01); H04R 27/00 (2006.01)

CPC H04R 1/406 (2013.01) [G10L 21/0208 (2013.01); G10L 25/78 (2013.01); H04B 17/318 (2015.01); G10L 2021/02082 (2013.01); G10L 21/0272 (2013.01); G10L 21/0316 (2013.01); H04M 3/56 (2013.01); H04M 3/568 (2013.01); H04M 7/006 (2013.01); H04M 9/082 (2013.01); H04M 2203/509 (2013.01); H04N 7/15 (2013.01); H04R 3/005 (2013.01); H04R 27/00 (2013.01); H04R 2410/01 (2013.01); H04R 2430/23 (2013.01)]

20 Claims

1. A conferencing system comprising:

a plurality of microphones, the plurality of microphones including at least first and second microphones positioned in a conferencing environment, the first microphone configured, responsive to detected sound, to generate at least one first audio signal, the second microphone configured, responsive to detected sound, to generate at least one second audio signal; and

one or more processors executing an audio processing system and operatively coupled to the plurality of microphones, the audio processing system configured to:

perform one or more blind source separation operations on the first audio signal to generate a first plurality of source-separated audio signals each corresponding to a different audio source;

perform one or more blind source separation operations on the second audio signal to generate a second plurality of source-separated audio signals each corresponding to a different audio source;

process a first source-separated signal of the first plurality of source-separated audio signals to identify a first type of audio source corresponding to the first source-separated signal;

process a second source-separated signal of the second plurality of source-separated audio signals to identify a second type of audio source corresponding to the second source-separated signal,

wherein at least one of the first or second types is a non-speech source;

classifying the non-speech source into a semantic category;

identify, using an event activity detection technique, the non-speech source as an event, wherein the event corresponds to the semantic category;

in response to the identification of the semantic category and corresponding event, implement a response action by adjusting audio output characteristics or video characteristics of a peripheral communicably coupled to the audio processing system; and

generate an output audio stream including audio content from one or both of the first source-separated signal and the second source-separated signal, the output stream generated at least in part based on one or more of direction of arrival information, signal strength information and the adjusted audio or video characteristics.