CPC G06V 10/454 (2022.01) [G06T 13/205 (2013.01); G06V 40/169 (2022.01)] | 12 Claims |
1. A speech video generation device that is a computing device having one or more processors and a memory which stores one or more programs executed by the one or more processors, the speech video generation device comprising:
a first encoder, which receives an input of a first person background image of a predetermined person partially hidden by a first mask, and extracts a first image feature vector from the first person background image;
a second encoder, which receives an input of a second person background image of the person partially hidden by a second mask, and extracts a second image feature vector from the second person background image;
a third encoder, which receives an input of a speech audio signal of the person, and extracts a voice feature vector from the speech audio signal;
a combining unit, which generates a combined vector by combining the first image feature vector output from the first encoder, the second image feature vector output from the second encoder, and the voice feature vector output from the third encoder; and
a decoder, which reconstructs a speech video of the person using the combined vector as an input.
|