US 12,272,003 B2
	Videoconference method and videoconference system
Carsten Kraus, Pforzheim (DE)
Assigned to Casablanca.ai GmbH, Pforzheim (DE)
Appl. No. 18/010,654
Filed by Casablanca.ai GmbH, Pforzheim (DE)
PCT Filed Jun. 17, 2021, PCT No. PCT/EP2021/066522 § 371(c)(1), (2) Date Dec. 15, 2022, PCT Pub. No. WO2021/255211, PCT Pub. Date Dec. 23, 2021.
Claims priority of application No. 20181006 (EP), filed on Jun. 19, 2020.
Prior Publication US 2023/0139989 A1, May 4, 2023
Int. Cl. G06T 19/00 (2011.01); G06F 3/01 (2006.01); G06T 7/73 (2017.01); G06T 15/04 (2011.01); G06T 17/00 (2006.01); H04L 12/18 (2006.01); H04N 7/14 (2006.01); H04N 7/15 (2006.01); G06V 40/16 (2022.01)

CPC G06T 19/00 (2013.01) [G06F 3/012 (2013.01); G06F 3/013 (2013.01); G06T 7/75 (2017.01); G06T 15/04 (2013.01); G06T 17/00 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/10024 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30168 (2013.01); G06T 2207/30201 (2013.01); G06T 2219/024 (2013.01); G06V 40/174 (2022.01); H04L 12/1813 (2013.01)]

15 Claims

1. Video conferencing method, in which

first video image data are reproduced by a first video conferencing device by means of a first display device and at least a region of the head of a first user comprising the eyes is captured by a first image capture device in a position in which the first user is looking at the video image data reproduced by the first display device, the video image data reproduced by the first display device comprising at least a depiction of the eyes of a second user captured by a second image capture device of a second video conferencing device arranged remotely from the first video conferencing device;

a processing unit receives and modifies the video image data of at least the region of the head of the first user comprising the eyes, captured by the first image capture device, and the modified video image data are transmitted to and reproduced by a second display device of the second video conferencing device,

the direction of gaze of the first user being detected during the processing of the video image data and, in the video image data, at least the reproduction of the region of the head of the first user comprising the eyes then being modified so that a target direction of gaze of the first user depicted in the modified video image data appears as if the first image capture device were arranged on a straight line passing through a first surrounding region of the eyes of the first user and through a second surrounding region of the eyes of the second user depicted on the first display device,

wherein:

the video image data captured by the first image capture device comprise at least a depiction of the head of the first user,

the pose of the head of the first user is determined in the captured video image data,

the direction of gaze of the first user is detected from the determined pose of the head of the first user, and

the following steps are carried out during the processing of the captured video image data:

a) creating a deformable three-dimensional model of the head of the first user,

b) projecting the captured video image data into the created three-dimensional model of the first user so that a first three-dimensional representation of the head of the first user captured by the first image capture device is created, said first three-dimensional representation having at least one gap region resulting from occluded regions of the head of the first user that are not visible in the captured video image data,

c) calculating a texture to fill the gap region,

d) generating a second three-dimensional representation of the head of the first user, in which the gap region is filled with the calculated texture, and

e) modifying the captured video image data in such a way that the head of the first user is depicted by the second three-dimensional representation such that the target direction of gaze of the head of the first user in the modified video image data of the first user appears as if the first image capture device were arranged on a straight line passing through a first surrounding region of the eyes of the first user and through a second surrounding region of the eyes of the second user depicted on the first display device,

wherein

during the processing of the video image data, the modified video image data are generated by a Generative Adversarial Network (GAN) with a generator network and a discriminator network,

the generator network generating modified video image data and the discriminator network evaluating a similarity between the depiction of the head of the first user in the modified video image data and the captured video image data and also evaluating a match between the direction of gaze of the first user in the modified video image data and the target direction of gaze.