US 12,192,669 B2
	Auto focus on speaker during multi-participant communication conferencing
Prateek Singh, Noida (IN); Divyarajsinh Jadeja, Pune (IN); Mittali Jangid, Gujarat (IN); and Dean Beightol, Superior, CO (US)
Assigned to Avaya Management L.P., Durham, NC (US)
Filed by Avaya Management L.P., Durham, NC (US)
Filed on Jun. 8, 2022, as Appl. No. 17/835,120.
Prior Publication US 2023/0403366 A1, Dec. 14, 2023
Int. Cl. H04N 5/262 (2006.01); G06V 20/40 (2022.01); G06V 40/16 (2022.01); G06V 40/20 (2022.01); G10L 17/06 (2013.01); G10L 25/57 (2013.01); H04L 65/403 (2022.01)

CPC H04N 5/2628 (2013.01) [G06V 20/40 (2022.01); G06V 40/161 (2022.01); G06V 40/172 (2022.01); G06V 40/20 (2022.01); G10L 17/06 (2013.01); G10L 25/57 (2013.01); H04L 65/403 (2013.01)]

20 Claims

1. A method, comprising:

receiving video captured of a scene that includes a plurality of images of participants to a communication session;

identifying the plurality of images of the participants in the video captured of the scene based on registered facial prints;

recognizing audio from at least one of the participants to the communication session based on registered voice prints;

detecting facial movement in one of the images of the plurality of images using artificial intelligence;

equating the recognized audio to the detected facial movement in the one of the images of the plurality of images using artificial intelligence;

selecting the one of the images of the plurality of images as a speaker based on the equated recognized audio to the detected facial movement in the one of the images of the plurality of images and the registered facial prints and voice prints;

zooming in on the speaker; and

filtering out a remainder of the images of the plurality of images.