CPC G06T 17/20 (2013.01) [G06T 7/11 (2017.01); G06T 7/50 (2017.01); G06T 13/40 (2013.01); G06T 15/04 (2013.01); G06V 10/766 (2022.01); G06V 10/776 (2022.01); G06V 40/171 (2022.01); G06V 40/174 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01)] | 12 Claims |
1. A method for reconstructing a three-dimensional (3D) face model from an input 2D image of the face, the method comprising:
feeding the input 2D image into at least one preparatory network to generate image features;
feeding the image features to a parameter regressor network to predict at least one of: parameters of a 3D face parametric model, camera parameters, hair parameters and wrinkle parameters;
generating an initial UV texture map using the input 2D image and the at least one of the parameters of the 3D face parametric model, the camera parameters, the hair parameters or the wrinkle parameters;
feeding the initial UV texture map into a UV completion network to generate a full UV texture map and illumination parameters, for completing areas that are missing in the initial UV texture map;
generating the 3D face model using the parameters of the 3D face parametric model, the camera parameters, the hair parameters, the wrinkle parameters, the full UV texture map and the illumination parameters,
rendering a rendered 2D image from the 3D face model;
calculating a loss function based on differences between the input 2D image and the rendered 2D image; and
training the parameter regressor network and the UV completion network using the loss function to obtain a trained parameter regressor network and a trained UV completion network,
wherein the image features comprise 3D landmarks, facial features, and segmentation maps, and wherein the at least one preparatory network comprises:
a 3D landmark detector network used to obtain the 3D landmarks;
a feature extractor (FE) network used to obtain facial features; and
a face segmentation (FS) network used to obtain segmentation maps,
wherein the 3D face parametric model is a faces learned with an articulated model and expressions model, and wherein the parameters of the 3D face parametric model comprise jaw parameters, neck parameters, translation parameters, camera parameters, shape parameters and expression parameters,
wherein the parameter regressor network comprises subnetworks M1_0, M1_1, M1_2, M1_3, wherein feeding the 3D landmarks, the facial features, and the segmentation maps into the parameter regressor network comprises:
feeding the 3D landmarks into subnetwork M1_0 to predict jaw parameters, neck parameters and translation parameters and camera parameters;
feeding the face recognition embedding into subnetwork M1_1 to predict shape parameters;
feeding the expression embedding and the mouth animation keypoint into subnetwork M1_2 to predict expression parameters and wrinkle parameters; and
feeding the shape parameters and the segmentation maps into M1_3 subnetwork to predict the hair parameters,
wherein the parameters of the 3D face parametric model comprise the jaw parameters, the neck parameters, the translation parameters, the shape parameters and the expression parameters.
|