CPC G10L 21/007 (2013.01) [G10L 21/013 (2013.01); G10L 21/028 (2013.01)] | 15 Claims |
1. A signal processing apparatus, comprising:
a central processing unit (CPU) configured to:
receive first acoustic data of a sound of an input sound source;
receive a voice quality converter parameter, wherein
the voice quality converter parameter is trained based on a discriminator parameter, a speaker ID of a target sound source, and first training data of the sound of the input sound source,
the discriminator parameter is trained based on the first training data of the sound of the input sound source, second training data of a sound of the target sound source, and third training data of a sound of a sound source different from the input sound source and the target sound source,
the target sound source is different from the input sound source,
the discriminator parameter discriminates the input sound source of the first acoustic data,
the first training data and the second training data are based on second acoustic data of a mixed sound,
the mixed sound includes the sound of the input sound source and the sound of the target sound source, and
the second acoustic data is different from parallel data and clean data; and
convert the first acoustic data of the input sound source to third acoustic data of voice quality of the target sound source, wherein the conversion of the first acoustic data to the third acoustic data is based on the voice quality converter parameter.
|