CPC G10L 13/047 (2013.01) [G10L 13/0335 (2013.01); G10L 13/06 (2013.01); G10L 13/02 (2013.01); G10L 13/04 (2013.01); G10L 13/10 (2013.01)] | 14 Claims |
1. An information processing method implemented by a computer, the method comprising:
providing a first piece of sound source data, which has been obtained by encoding first identification data that identifies a first sound source, wherein the first piece of sound source data represents acoustic features of the first sound source, represented as a first embedding vector in a first multidimensional space;
providing a first piece of style data, which has been obtained by encoding second identification data that identifies a first performance style, wherein the first piece of style data represents acoustic features of sound generated by the first sound source in the first performance style, represented as a first embedding vector in a second multidimensional space;
generating, using a synthesis model generated by machine learning, first feature data representative of acoustic features of a first target sound of the first sound source to be generated in the first performance style and according to first sound conditions, by inputting into the synthesis model:
the first piece of sound source,
the first piece of style data, and
first synthesis data representative of the first sounding conditions; and
generating a first audio signal corresponding to the first target sound using the generated first feature data.
|