US 12,437,773 B2
Apparatus and method for generating speech synthesis image
Gyeong Su Chae, Seoul (KR); and Guem Buel Hwang, Seoul (KR)
Assigned to DEEPBRAIN AI INC., Seoul (KR)
Appl. No. 17/779,693
Filed by DEEPBRAIN AI INC., Seoul (KR)
PCT Filed Mar. 15, 2022, PCT No. PCT/KR2022/003610
§ 371(c)(1), (2) Date May 25, 2022,
PCT Pub. No. WO2023/153555, PCT Pub. Date Aug. 17, 2023.
Claims priority of application No. 10-2022-0019075 (KR), filed on Feb. 14, 2022.
Prior Publication US 2024/0303830 A1, Sep. 12, 2024
Int. Cl. G10L 21/10 (2013.01); G06T 7/246 (2017.01); G10L 15/25 (2013.01)
CPC G10L 21/10 (2013.01) [G06T 7/248 (2017.01); G10L 15/25 (2013.01); G06T 2207/20076 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01)] 12 Claims
OG exemplary drawing
 
1. An apparatus for generating a speech synthesis image based on machine learning, the apparatus comprising:
a first global geometric transformation predictor configured to be trained to receive each of a source image and a target image including the same person, and predict a global geometric transformation for a global motion of the person between the source image and the target image, based on the source image and the target image;
a local feature tensor predictor configured to be trained to predict a feature tensor for a local motion of the person, based on input target image-related information; and
an image generator configured to be trained to reconstruct the target image, based on the global geometric transformation, the source image, and the feature tensor for the local motion,
wherein the global motion is a motion of the person with an amount greater than or equal to a preset threshold amount of motion,
wherein the first global geometric transformation predictor is further configured to:
extract a source image heat map based on the source image, the source image heat map being a probability distribution map in an image space indicating whether each pixel in the source image is a pixel related to the global motion of the person;
extract a target image heat map based on the target image, the target image heat map being a probability distribution map in the image space as to whether each pixel in the target image is a pixel related to the global motion of the person; and
calculate the global geometric transformation based on the source image heat map and the target image heat map.