US 12,322,016 B2
	Apparatus and method for generating speech synthesis image
Gyeong Su Chae, Seoul (KR); and Guem Buel Hwang, Seoul (KR)
Assigned to DEEPBRAIN AI INC., Seoul (KR)
Appl. No. 17/779,651
Filed by DEEPBRAIN AI INC., Seoul (KR)
PCT Filed Mar. 15, 2022, PCT No. PCT/KR2022/003607 § 371(c)(1), (2) Date May 25, 2022, PCT Pub. No. WO2023/153553, PCT Pub. Date Aug. 17, 2023.
Claims priority of application No. 10-2022-0017213 (KR), filed on Feb. 9, 2022.
Prior Publication US 2024/0412439 A1, Dec. 12, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 13/40 (2011.01); G06T 7/246 (2017.01); G06T 7/262 (2017.01); G06T 11/20 (2006.01); G06T 13/20 (2011.01); G10L 15/02 (2006.01)

CPC G06T 13/40 (2013.01) [G06T 7/246 (2017.01); G06T 7/262 (2017.01); G06T 11/206 (2013.01); G06T 13/205 (2013.01); G10L 15/02 (2013.01); G06T 2207/20076 (2013.01); G06T 2207/30201 (2013.01)]

12 Claims

1. An apparatus for generating a speech synthesis image based on machine learning, the apparatus comprising:

at least one processor configured to implement:

a first global geometric transformation predictor configured to be trained to receive each of a source image and a target image including the same person, and predict a global geometric transformation for a global motion of the person between the source image and the target image, based on the source image and the target image;

a local feature tensor predictor configured to be trained to predict a feature tensor for a local motion of the person, based on preset input data; and

an image generator configured to be trained to reconstruct the target image, based on the global geometric transformation, the source image, and the feature tensor for the local motion-,

wherein the local feature tensor predictor includes a first local feature tensor predictor configured to be trained to predict a speech feature tensor for a local speech motion of the person, based on a preset voice signal, and

the local speech motion is a motion related to speech of the person.