| CPC H04R 1/406 (2013.01) [G10L 21/0208 (2013.01); G10L 25/78 (2013.01); H04B 17/318 (2015.01); G10L 2021/02082 (2013.01); G10L 21/0272 (2013.01); G10L 21/0316 (2013.01); H04M 3/56 (2013.01); H04M 3/568 (2013.01); H04M 7/006 (2013.01); H04M 9/082 (2013.01); H04M 2203/509 (2013.01); H04N 7/15 (2013.01); H04R 3/005 (2013.01); H04R 27/00 (2013.01); H04R 2410/01 (2013.01); H04R 2430/23 (2013.01)] | 20 Claims |

|
1. A conferencing system comprising:
a plurality of microphones, the plurality of microphones including at least first and second microphones positioned in a conferencing environment, the first microphone configured, responsive to detected sound, to generate at least one first audio signal, the second microphone configured, responsive to detected sound, to generate at least one second audio signal; and
one or more processors executing an audio processing system and operatively coupled to the plurality of microphones, the audio processing system configured to:
perform one or more blind source separation operations on the first audio signal to generate a first plurality of source-separated audio signals each corresponding to a different audio source;
perform one or more blind source separation operations on the second audio signal to generate a second plurality of source-separated audio signals each corresponding to a different audio source;
process a first source-separated signal of the first plurality of source-separated audio signals to identify a first type of audio source corresponding to the first source-separated signal;
process a second source-separated signal of the second plurality of source-separated audio signals to identify a second type of audio source corresponding to the second source-separated signal,
wherein at least one of the first or second types is a non-speech source;
classifying the non-speech source into a semantic category;
identify, using an event activity detection technique, the non-speech source as an event, wherein the event corresponds to the semantic category;
in response to the identification of the semantic category and corresponding event, implement a response action by adjusting audio output characteristics or video characteristics of a peripheral communicably coupled to the audio processing system; and
generate an output audio stream including audio content from one or both of the first source-separated signal and the second source-separated signal, the output stream generated at least in part based on one or more of direction of arrival information, signal strength information and the adjusted audio or video characteristics.
|