US 12,080,270 B2
Method and apparatus for text-based speech synthesis
Gyeongsu Chae, Seoul (KR); and Dalhyun Kim, Incheon (KR)
Assigned to DEEPBRAIN AI INC., Seoul (KR)
Appl. No. 17/763,337
Filed by DEEPBRAIN AI INC., Seoul (KR)
PCT Filed Dec. 22, 2020, PCT No. PCT/KR2020/018935
§ 371(c)(1), (2) Date Mar. 24, 2022,
PCT Pub. No. WO2022/065603, PCT Pub. Date Mar. 31, 2022.
Claims priority of application No. 10-2020-0124664 (KR), filed on Sep. 25, 2020.
Prior Publication US 2022/0366890 A1, Nov. 17, 2022
Int. Cl. G10L 13/033 (2013.01); G06F 40/12 (2020.01); G10L 13/047 (2013.01); G10L 13/08 (2013.01)
CPC G10L 13/08 (2013.01) [G06F 40/12 (2020.01)] 12 Claims
OG exemplary drawing
 
1. An apparatus for synthesizing speech, which is a computing apparatus that includes one or more processors and a memory storing one or more programs executed by the one or more processors, the apparatus comprising:
a pre-processing module configured to mark a preset classification symbol on each of unit texts input; and
a speech synthesis module configured to receive each unit text marked with the classification symbol and synthesize speech uttering the unit text based on the input unit text,
wherein the pre-processing module includes:
a classification unit configured to classify a position within a sentence and a relationship between sentences for each of unit texts input; and
a classification marking unit configured to mark the classification symbol on each unit text according to the position within the sentence and the relationship between the sentences of each of the unit texts,
wherein the relationship between sentences is regarding what kind of relationship each unit text has with a content of a previous sentence,
wherein the speech synthesis module is trained to synthesize speech of a predetermined speaker according to the position within the sentence and the relationship between the sentences of each of the unit texts, so that speech characteristics of the predetermined speaker according to a context of the predetermined speaker can be reflected in a speech synthesis.