US 11,869,529 B2
Speaking rhythm transformation apparatus, model learning apparatus, methods therefor, and program
Sadao Hiroya, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 17/417,749
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Jun. 20, 2019, PCT No. PCT/JP2019/024438
§ 371(c)(1), (2) Date Jun. 23, 2021,
PCT Pub. No. WO2020/136948, PCT Pub. Date Jul. 2, 2020.
Claims priority of application No. 2018-242126 (JP), filed on Dec. 26, 2018.
Prior Publication US 2022/0076691 A1, Mar. 10, 2022
Int. Cl. G10L 25/30 (2013.01); G10L 21/0272 (2013.01); G10L 21/057 (2013.01)
CPC G10L 25/30 (2013.01) [G10L 21/0272 (2013.01); G10L 21/057 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A speech rhythm conversion device comprising:
a model store configured to store a speech rhythm conversion model which is a neural network that:
receives, as an input thereto, a first feature value vector including first data associated with a speech rhythm of at least a phoneme extracted from a first speech signal resulting from a speech uttered by a speaker in a first group,
converts the speech rhythm of the first speech signal to a speech rhythm of a speaker in a second group, and
outputs the speech rhythm of the speaker in the second group;
a feature value extractor configured to extract, from an input speech signal resulting from the speech uttered by the speaker in the first group, second data associated with a vocal tract spectrum and the first data associated with the speech rhythm;
a convertor configured to receive the first feature value vector including the first data associated with the speech rhythm extracted from the input speech signal to the speech rhythm conversion model and obtain the post-conversion speech rhythm; and
a speech synthesizer configured to use the post-conversion speech rhythm and the first data associated with the vocal tract spectrum extracted from the input speech signal to generate an output speech signal.