CPC H04N 5/2628 (2013.01) [G06V 20/40 (2022.01); G06V 40/161 (2022.01); G06V 40/172 (2022.01); G06V 40/20 (2022.01); G10L 17/06 (2013.01); G10L 25/57 (2013.01); H04L 65/403 (2013.01)] | 20 Claims |
1. A method, comprising:
receiving video captured of a scene that includes a plurality of images of participants to a communication session;
identifying the plurality of images of the participants in the video captured of the scene based on registered facial prints;
recognizing audio from at least one of the participants to the communication session based on registered voice prints;
detecting facial movement in one of the images of the plurality of images using artificial intelligence;
equating the recognized audio to the detected facial movement in the one of the images of the plurality of images using artificial intelligence;
selecting the one of the images of the plurality of images as a speaker based on the equated recognized audio to the detected facial movement in the one of the images of the plurality of images and the registered facial prints and voice prints;
zooming in on the speaker; and
filtering out a remainder of the images of the plurality of images.
|