US 12,450,810 B2
	Animated facial expression and pose transfer utilizing an end-to-end machine learning model
Cameron Smith, Santa Clara, CA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Mar. 27, 2023, as Appl. No. 18/190,684.
Prior Publication US 2024/0331247 A1, Oct. 3, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 13/40 (2011.01); G06T 7/246 (2017.01); G06V 10/94 (2022.01); G06V 40/16 (2022.01)

CPC G06T 13/40 (2013.01) [G06T 7/251 (2017.01); G06V 10/95 (2022.01); G06V 40/176 (2022.01); G06T 2200/24 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

extracting, utilizing a first three-dimensional encoder, a first target facial expression animation embeddings for a first resolution from a first frame of a target digital video portraying a target animation of a face;

extracting, utilizing the first three-dimensional encoder, a first target pose animation embeddings for the first resolution from the first frame of the target digital video;

identifying a static source digital image portraying a source face having a source shape and facial expression;

generating, utilizing a second three-dimensional encoder, a first source shape embedding for the first resolution from the static source digital image;

generating a first combined embeddings by concatenating the first target facial expression animation embeddings for the first resolution from the first frame of the target digital video, the first target pose animation embeddings for the first resolution from the first frame of the target digital video, and the first source shape embedding for the first resolution from the static source digital image;

extracting, utilizing the first three-dimensional encoder, a second target facial expression animation embedding for a second resolution from the first frame of the target digital video portraying the target animation of the face;

extracting, utilizing the first three-dimensional encoder, a second target pose animation embedding for the second resolution from the first frame of the target digital video;

generating, utilizing the second three-dimensional encoder, a second source shape embedding for the second resolution from the static source digital image;

generating a second combined embedding by concatenating the second target facial expression animation embedding for the second resolution from the first frame of the target digital video, the second target pose animation embedding for the second resolution from the first frame of the target digital video, and the second source shape embedding for the second resolution from the static source digital image; and

generating, utilizing a facial animation generative adversarial neural network comprising a first layer corresponding to the first resolution and a second layer corresponding to the second resolution, an animation by conditioning the first layer of the facial animation generative adversarial neural network with the first combined embeddings and conditioning the second layer of the facial animation generative adversarial neural network with the second combined embedding, wherein the animation portrays the source face animated according to the target animation from the target digital video.