US 12,322,018 B2
Latent space editing and neural animation to generate hyperreal synthetic faces
Chris Ume, Bangkok (TH); Jo Plaete, London (GB); Martin Adams, Cheltenham (GB); and Thomas Graham, London (GB)
Assigned to Metaphysic Limited, London (GB)
Filed by Metaphysic Limited, London (GB)
Filed on Dec. 27, 2022, as Appl. No. 18/089,487.
Prior Publication US 2024/0212249 A1, Jun. 27, 2024
Int. Cl. G06T 13/40 (2011.01); G06N 20/00 (2019.01); G06T 13/20 (2011.01); G06T 19/00 (2011.01); G10L 13/033 (2013.01)
CPC G06T 13/40 (2013.01) [G06N 20/00 (2019.01); G06T 13/205 (2013.01); G06T 19/006 (2013.01); G10L 13/033 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
training, by one or more processors, a machine learning model to generate a synthetic face of a person featured in unaltered video content to obtain a trained machine learning model, wherein the training is based at least in part on input video data of an actor speaking a spoken utterance or a three-dimensional (3D) model of a face of the person that has been animated in accordance with the spoken utterance;
providing, by the one or more processors, two images of the face of the person to the trained machine learning model;
receiving, by the one or more processors, two latent space points that correspond to the two images;
determining, by the one or more processors, a neural animation vector based at least in part on a difference between the two latent space points;
applying, by the one or more processors, the neural animation vector to points within a latent space associated with the trained machine learning model to obtain modified latent space points;
generating, by the one or more processors, using the trained machine learning model, and based at least in part on the modified latent space points, instances of the synthetic face; and
overlaying, by the one or more processors, the instances of the synthetic face on two-dimensional (2D) representations of the face depicted in frames of the unaltered video content to generate output video data corresponding to altered video content featuring the person with the synthetic face speaking the spoken utterance.