| CPC G06T 13/205 (2013.01) [G06T 13/40 (2013.01); G06T 17/20 (2013.01)] | 20 Claims |

|
1. A computing system to generate a talking face from an audio signal, the computing system comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising:
obtaining audio data descriptive audio signals comprising speech;
processing the audio data with a machine-learned face geometry prediction model to predict a set of face geometries based on the audio data, wherein the machine-learned face geometry prediction model was trained to predict three-dimensional face geometries based on data descriptive of input audio signals comprising speech;
processing the audio data with a machine-learned face texture prediction model to predict a set of face textures based on the audio data, wherein the machine-learned face texture prediction model was trained to predict two-dimensional face textures based on data descriptive of input audio signals comprising speech; and
generating a synthesized video based on the audio data, the set of face geometries, and the set of face textures, wherein the synthesized video comprises a face performing movements associated with the speech of the audio data based on the set of face geometries and the set of face textures.
|