US 12,205,342 B2
	Device and method for generating speech video
Gyeongsu Chae, Seoul (KR); and Guembuel Hwang, Seoul (KR)
Assigned to DEEPBRAIN AI INC., Seoul (KR)
Appl. No. 17/763,243
Filed by DEEPBRAIN AI INC., Seoul (KR)
PCT Filed Dec. 15, 2020, PCT No. PCT/KR2020/018374 § 371(c)(1), (2) Date Mar. 24, 2022, PCT Pub. No. WO2022/045486, PCT Pub. Date Mar. 3, 2022.
Claims priority of application No. 10-2020-0107191 (KR), filed on Aug. 25, 2020.
Prior Publication US 2022/0375190 A1, Nov. 24, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 10/44 (2022.01); G06T 13/20 (2011.01); G06V 40/16 (2022.01)

CPC G06V 10/454 (2022.01) [G06T 13/205 (2013.01); G06V 40/169 (2022.01)]

12 Claims

1. A speech video generation device that is a computing device having one or more processors and a memory which stores one or more programs executed by the one or more processors, the speech video generation device comprising:

a first encoder, which receives an input of a first person background image of a predetermined person partially hidden by a first mask, and extracts a first image feature vector from the first person background image;

a second encoder, which receives an input of a second person background image of the person partially hidden by a second mask, and extracts a second image feature vector from the second person background image;

a third encoder, which receives an input of a speech audio signal of the person, and extracts a voice feature vector from the speech audio signal;

a combining unit, which generates a combined vector by combining the first image feature vector output from the first encoder, the second image feature vector output from the second encoder, and the voice feature vector output from the third encoder; and

a decoder, which reconstructs a speech video of the person using the combined vector as an input.