| CPC G06V 40/166 (2022.01) [G06T 7/70 (2017.01); G06V 10/12 (2022.01); G06V 10/56 (2022.01); G06V 10/74 (2022.01); G06V 20/40 (2022.01); G06V 20/46 (2022.01); G06V 40/103 (2022.01); G06V 40/168 (2022.01); G06V 40/20 (2022.01); G10L 17/00 (2013.01); G10L 25/57 (2013.01); H04N 5/268 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/30168 (2013.01); G06T 2207/30196 (2013.01)] | 47 Claims |

|
1. A multi-camera system, comprising:
a plurality of cameras configured to generate a plurality of video output streams representative of a meeting environment including a plurality of meeting participants, wherein representations of the plurality of meeting participants are captured across the plurality of video output streams; and
a video processing unit configured to:
detect, using machine learning, a number of meeting participants present in the meeting environment and a location of each of the plurality of meeting participants;
automatically analyze each video output stream of the plurality of video output streams, based on at least one identity indicator, the number of meeting participants, and the location of each meeting participant, to determine whether one or more representations of one or more meeting participants in each video output stream and another one or more representations of one or more meeting participants in another video output stream correspond to one or more common meeting participants by:
determining one or more distances between a first one or more feature vectors, each feature vector of the first one or more feature vectors corresponding to a representation of a meeting participant in a first video output stream, and a second one or more feature vectors, each feature vector of the second one or more feature vectors corresponding to a representation of a meeting participant in a second video output stream; and
associating, based on the determined one or more distances, a first feature vector of the first one or more feature vectors and a second feature vector of the second one or more feature vectors with a common meeting participant, wherein the determined one or more distances is less than a predetermined threshold;
evaluate a first representation from a third video output stream and a second representation from a fourth video output stream of a particular common meeting participant relative to one or more predetermined criteria;
select, based on the evaluation, either the third video output stream or the fourth video output stream as a source of a framed representation of the particular common meeting participant to be output as a primary video stream; and
generate, as an output of the multi-camera system, the primary video stream including the framed representation of the particular common meeting participant.
|