CPC H04N 19/29 (2014.11) [G06N 3/045 (2023.01); G06T 3/40 (2013.01); G06T 7/62 (2017.01); G06V 40/161 (2022.01); H04N 19/17 (2014.11); H04N 19/30 (2014.11); H04N 19/85 (2014.11); G06T 2207/10016 (2013.01); G06T 2207/30201 (2013.01)] | 8 Claims |
1. A method for video coding performed by at least one processor, the method comprising:
obtaining video data;
detecting at least one face from at least one frame of the video data;
determining a set of facial landmark features of the at least one face from the at least one frame of the video data;
determining an extended face area (EFA) which comprises a boundary area extended from an area of the detected at least one face from the at least one frame of the video data;
determining a set of EFA features from the EFA; and
coding the video data at least partly by a neural network based on the determined set of facial landmark features and on aggregating the set of facial landmark features, reconstructed EFA features, and an up-sampled sequence that is up-sampled from at least one down-sampled sequence,
wherein the video data comprises an encoded bitstream of the video data,
wherein determining the set of facial landmark features comprises up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream,
wherein determining the EFA and determining the set of EFA features comprise up-sampling the at least one down-sampled sequence obtained by decompressing the encoded bitstream, and
wherein determining the EFA and determining the set of EFA features further comprise reconstructing the EFA features, into the reconstructed EFA features, each respective to ones of the facial landmark features of the set of facial landmark features by a generative adversarial network.
|