| CPC G06T 13/205 (2013.01) [G06T 13/40 (2013.01); G10L 25/30 (2013.01)] | 15 Claims |

|
1. A computer-implemented method, the method comprising:
obtaining a trained generative machine-learning model, the trained generative machine-learning model configured to process (i) input data derived from speech audio and (ii) a conditioning input representing a particular facial expression to generate facial animation data corresponding to the speech audio and the particular facial expression;
obtaining input data derived from speech audio for processing by the trained generative machine-learning model;
determining a conditioning input representing a particular facial expression from a set of reference speech animation examples, each reference speech animation example comprising data derived from speech audio and corresponding ground-truth facial animation data having the particular facial expression, wherein determining the conditioning input comprises:
initializing the conditioning input;
processing, using the trained generative machine-learning model: (i) the conditioning input, and (ii) the data derived from speech audio of one or more reference speech animation examples from the set of reference speech animation examples;
generating, as output of the trained generative machine learning model, predicted facial animation data for each reference speech animation example;
determining a loss for each reference speech animation example, wherein the loss for a reference speech animation example is dependent on the predicted facial animation data and the ground truth facial animation data of the reference speech animation example; and
updating the conditioning input based on the losses of the speech animation examples whilst the weights of the trained generative machine-learning model are held frozen;
processing, by the trained generative machine-learning model, (i) the input data derived from speech audio for processing and (ii) the determined conditioning input representing a particular facial expression from the set of reference speech animation examples to generate facial animation data corresponding to the speech audio and the particular facial expression.
|