US 12,488,779 B2
Method and apparatus for voice synthesis based on brain waves during imagined speech
Seong Whan Lee, Seoul (KR); Young Eun Lee, Seongnam-si (KR); Seo Hyun Lee, Seoul (KR); Soo Won Kim, Daegu (KR); Sang Ho Kim, Seoul (KR); Byung Kwan Ko, Donghae-si (KR); Ji Won Lee, Seoul (KR); and Jung Sun Lee, Changwon-si (KR)
Assigned to Korea University Research and Business Foundation, Seoul (KR)
Filed by Korea University Research and Business Foundation, Seoul (KR)
Filed on May 19, 2023, as Appl. No. 18/199,670.
Claims priority of application No. 10-2022-0173551 (KR), filed on Dec. 13, 2022.
Prior Publication US 2024/0194179 A1, Jun. 13, 2024
Int. Cl. G10L 13/027 (2013.01); G06F 3/01 (2006.01); G10L 13/047 (2013.01); G10L 15/26 (2006.01); G10L 25/18 (2013.01)
CPC G10L 13/027 (2013.01) [G06F 3/015 (2013.01); G10L 13/047 (2013.01); G10L 15/26 (2013.01); G10L 25/18 (2013.01)] 10 Claims
OG exemplary drawing
 
1. A method for synthesizing a voice based on brain waves during imagined speech through an apparatus for synthesizing the voice based on brain waves during imagined speech, the method comprising the steps of:
a step to obtain user's brain waves during imagined speech;
a step to convert the obtained brain waves during imagined speech into embedding vectors;
a step to generate Mel-spectrograms based on the embedding vectors;
a step to generate the voice using the Mel-spectrograms;
a step to output the generated voice; and
a step to train a generator and a discriminator,
wherein the generator can generate the Mel-spectrograms using the embedding vectors, and the discriminator can distinguish between the Mel-spectrograms generated from brain waves and the Mel-spectrograms generated from voice, and
wherein the step to train the generator and the discriminator comprises the steps of:
a step to make the generator and the discriminator learn based on the brain waves and voice signals generated during speech;
a step to perform a transfer learning for the generator and the discriminator that have learned based on the brain waves and the voice signals generated during speech; and
a step to make the generator and the discriminator re-learn based on brain waves generated during imagined speech.