US 12,033,259 B2
Photorealistic talking faces from audio
Vivek Kwatra, Saratoga, CA (US); Christian Frueh, Mountain View, CA (US); Avisek Lahiri, West Bengal (IN); and John Lewis, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/796,399
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Jan. 29, 2021, PCT No. PCT/US2021/015698
§ 371(c)(1), (2) Date Jul. 29, 2022,
PCT Pub. No. WO2021/155140, PCT Pub. Date Aug. 5, 2021.
Claims priority of provisional application 62/967,335, filed on Jan. 29, 2020.
Prior Publication US 2023/0343010 A1, Oct. 26, 2023
Int. Cl. G06T 13/20 (2011.01); G06T 13/40 (2011.01); G06T 17/20 (2006.01)
CPC G06T 13/205 (2013.01) [G06T 13/40 (2013.01); G06T 17/20 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing system to generate a talking face from an audio signal, the computing system comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store:
a machine-learned face geometry prediction model configured to predict a face geometry based on data descriptive of an audio signal that comprises speech;
a machine learned face texture prediction model configured to predict a face texture based on data descriptive of the audio signal that comprises the speech; and
instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising:
obtaining the data descriptive of the audio signal that comprises speech;
using the machine-learned face geometry prediction model to predict the face geometry based at least in part on the data descriptive of the audio signal;
using the machine-learned face texture prediction model to predict the face texture based at least in part on the data descriptive of the audio signal; and
combining the face geometry with the face texture to generate a three-dimensional face mesh model.