| CPC H04N 7/15 (2013.01) [G06T 7/70 (2017.01); G10L 25/06 (2013.01); G10L 25/57 (2013.01); G10L 25/78 (2013.01); H04L 65/403 (2013.01); H04N 5/268 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/30201 (2013.01)] | 20 Claims |

|
1. A method performed by a video conference system having cameras and microphone arrays each co-located with a corresponding one of the cameras, the method comprising:
detecting a face of a participant, and estimating orientations of the face relative to the cameras, based on video captured by the cameras;
receiving, from each microphone array, at least two microphone signals that represent detected audio from the participant;
separately correlating the at least two microphone signals from each microphone array against each other using a correlation function to produce a correlation peak that indicates a time difference of arrival between the at least two microphone signals, wherein separately correlating produces correlation peaks for corresponding ones of the microphone arrays;
determining a preferred camera among the cameras based on the correlation peaks and the orientations of the face relative to the cameras; and
transmitting the video captured by the preferred camera to a network.
|