US 11,942,071 B2
Information processing method and information processing system for sound synthesis utilizing identification data associated with sound source and performance styles
Ryunosuke Daido, Hamamatsu (JP); Merlijn Blaauw, Barcelona (ES); and Jordi Bonada, Barcelona (ES)
Assigned to YAMAHA CORPORATION, Hamamatsu (JP)
Filed by YAMAHA CORPORATION, Hamamatsu (JP)
Filed on May 4, 2021, as Appl. No. 17/307,322.
Application 17/307,322 is a continuation of application No. PCT/JP2019/043510, filed on Nov. 6, 2019.
Claims priority of application No. 2018-209288 (JP), filed on Nov. 6, 2018.
Prior Publication US 2021/0256960 A1, Aug. 19, 2021
Int. Cl. G10L 13/047 (2013.01); G10L 13/02 (2013.01); G10L 13/033 (2013.01); G10L 13/04 (2013.01); G10L 13/06 (2013.01); G10L 13/10 (2013.01)
CPC G10L 13/047 (2013.01) [G10L 13/0335 (2013.01); G10L 13/06 (2013.01); G10L 13/02 (2013.01); G10L 13/04 (2013.01); G10L 13/10 (2013.01)] 14 Claims
OG exemplary drawing
 
1. An information processing method implemented by a computer, the method comprising:
providing a first piece of sound source data, which has been obtained by encoding first identification data that identifies a first sound source, wherein the first piece of sound source data represents acoustic features of the first sound source, represented as a first embedding vector in a first multidimensional space;
providing a first piece of style data, which has been obtained by encoding second identification data that identifies a first performance style, wherein the first piece of style data represents acoustic features of sound generated by the first sound source in the first performance style, represented as a first embedding vector in a second multidimensional space;
generating, using a synthesis model generated by machine learning, first feature data representative of acoustic features of a first target sound of the first sound source to be generated in the first performance style and according to first sound conditions, by inputting into the synthesis model:
the first piece of sound source,
the first piece of style data, and
first synthesis data representative of the first sounding conditions; and
generating a first audio signal corresponding to the first target sound using the generated first feature data.