| CPC G10L 13/02 (2013.01) [G06T 13/205 (2013.01); G06T 13/40 (2013.01)] | 19 Claims |

|
1. A method of generating voices for virtual characters, comprising:
receiving a plurality of source sounds, wherein the plurality of source sounds correspond to a plurality of frames of a video, the video comprising a virtual character;
converting the plurality of source sounds into a plurality of representations in a latent space using a first model, wherein each representation among the plurality of representations comprises a plurality of parameters;
generating a plurality of sounds for the virtual character in the video in real time as the plurality of source sounds are received based on modifying at least one of the plurality of parameters of each representation in the latent space;
driving movements of the virtual character in the video by utilizing landmark coordinates generated based on input images by a second model; and
improving the movements of the virtual character in the video by using the plurality of source sounds as extra input to the second model, wherein the second model is configured to control the movements of the virtual character in the video.
|