US 11,659,193 B2
Framework for video conferencing based on face restoration
Wei Jiang, Palo Alto, CA (US); Wei Wang, Palo Alto, CA (US); and Shan Liu, Palo Alto, CA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Sep. 30, 2021, as Appl. No. 17/490,103.
Claims priority of provisional application 63/134,522, filed on Jan. 6, 2021.
Prior Publication US 2022/0217371 A1, Jul. 7, 2022
Int. Cl. H04N 19/29 (2014.01); H04N 19/30 (2014.01); H04N 19/85 (2014.01); H04N 19/17 (2014.01); G06T 3/40 (2006.01); G06T 7/62 (2017.01); G06V 40/16 (2022.01); G06N 3/045 (2023.01)
CPC H04N 19/29 (2014.11) [G06N 3/045 (2023.01); G06T 3/40 (2013.01); G06T 7/62 (2017.01); G06V 40/161 (2022.01); H04N 19/17 (2014.11); H04N 19/30 (2014.11); H04N 19/85 (2014.11); G06T 2207/10016 (2013.01); G06T 2207/30201 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A method for video coding performed by at least one processor, the method comprising:
obtaining video data;
detecting at least one face from at least one frame of the video data;
determining a set of facial landmark features of the at least one face from the at least one frame of the video data;
determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;
determining a set of EFA features from the EFA; and
coding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmark features, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,
wherein the video data comprises an encoded bitstream of the video data,
wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,
wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, and
wherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.