US 12,380,622 B2
	Robust facial animation from video using neural networks
Inaki Navarro, Zurich (CH); Dario Kneubuhler, Zurich (CH); Tijmen Verhulsdonck, Gothenburg (SE); Eloi Du Bois, Austin, TX (US); Will Welch, San Francisco, CA (US); Vivek Verma, Oakland, CA (US); Ian Sachs, Corte Madera, CA (US); and Kiran Bhat, San Francisco, CA (US)
Assigned to Roblox Corporation, San Mateo, CA (US)
Filed by Roblox Corporation, San Mateo, CA (US)
Filed on Apr. 30, 2024, as Appl. No. 18/650,744.
Application 18/650,744 is a continuation of application No. 17/677,123, filed on Feb. 22, 2022, granted, now 12,002,139.
Claims priority of provisional application 63/152,819, filed on Feb. 23, 2021.
Claims priority of provisional application 63/152,327, filed on Feb. 22, 2021.
Prior Publication US 2024/0355028 A1, Oct. 24, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 13/40 (2011.01); G06T 7/73 (2017.01); G06V 10/24 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 40/16 (2022.01)

CPC G06T 13/40 (2013.01) [G06T 7/73 (2017.01); G06V 10/24 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 40/161 (2022.01); G06V 40/171 (2022.01); G06V 40/174 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01)]

20 Claims

1. A computer-implemented method, comprising:

identifying, using a fully convolutional network, a set of bounding box candidates from an initial portion of a video, wherein each bounding box candidate includes a face;

refining, using a convolutional neural network, the set of bounding box candidates into a bounding box;

obtaining a first set of facial data based on the bounding box and the initial portion of the video using an output convolutional neural network;

generating a first animation frame of an animation of a three-dimensional (3D) avatar based on the first set of the facial data, wherein a face of the 3D avatar in the first animation frame is arranged based on the first set of the facial data; and

for each additional portion of the video subsequent to the first portion of the video,

detecting whether the bounding box, applied to the additional portion, includes the face;

if it is detected that the bounding box includes the face, obtaining an additional set of facial data based on the bounding box and the additional portion using the output convolutional neural network, and without using the fully convolutional network and the convolutional neural network to process the additional portion; and

generating an additional animation frame of the animation of the 3D avatar using the additional set of facial data.