| CPC G11B 27/036 (2013.01) [G06N 3/08 (2013.01); G06T 3/18 (2024.01); G06T 5/70 (2024.01); G06T 5/77 (2024.01); G06V 10/82 (2022.01); G06V 20/44 (2022.01); G06V 40/161 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/10024 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01)] | 17 Claims |

|
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
obtaining source video data comprising a plurality of sequences of image frames;
detecting respective instances of an object within at least some sequences of image frames of the plurality of sequences of image frames, the object being a human face; and
for a first instance of the object detected within a first sequence of image frames of the source data:
determining a framewise location and size of the first instance of the object in the first sequence of image frames;
obtaining, using a neural renderer, replacement video data comprising a modified instance of the object; and
replacing, using the determined framewise location and size, at least part of the first instance of the object in the first sequence of image frames with at least part of the modified instance of the object,
wherein obtaining the replacement video data comprises:
processing at least a portion of each image frame of the respective sequence of image frames to generate a three-dimensional synthetic model of the first instance of the object;
modifying the three-dimensional synthetic model; and
generating the replacement video data using the neural renderer and the modified three-dimensional synthetic model,
wherein modifying the three-dimensional synthetic model comprises:
obtaining driving data comprising an audio and/or video recording including speech;
processing the driving data to determine modified parameter values for the three-dimensional synthetic model corresponding to the speech; and
using the modified parameter values to modify the three-dimensional synthetic model, including progressively transitioning between unmodified parameter values for the three-dimensional synthetic model and the modified parameter values for the three-dimensional synthetic model in dependence on when speech is taking place in the driving data.
|