US 12,462,785 B2
Signal generation processing device
Takuma Okamoto, Koganei (JP); Tomoki Toda, Nagakute (JP); Yoshinori Shiga, Koganei (JP); and Hisashi Kawai, Koganei (JP)
Assigned to National Institute of Information and Communications Technology, Tokyo (JP)
Appl. No. 18/267,175
Filed by National Institute of Information and Communications Technology, Koganei (JP)
PCT Filed Dec. 17, 2021, PCT No. PCT/JP2021/046833
§ 371(c)(1), (2) Date Jun. 14, 2023,
PCT Pub. No. WO2022/145262, PCT Pub. Date Jul. 7, 2022.
Claims priority of application No. 2020-218755 (JP), filed on Dec. 28, 2020.
Prior Publication US 2024/0062742 A1, Feb. 22, 2024
Int. Cl. G10L 13/027 (2013.01)
CPC G10L 13/027 (2013.01) 9 Claims
OG exemplary drawing
 
2. A signal generation processing device that outputs an audio signal or an image signal corresponding to an input condition feature based on Gaussian white noise and the input condition feature, comprising: a first sub-model unit to an N-th sub-model unit, which are N (N is a natural number satisfying N 2) sub-model units, wherein the first sub-model unit to the N-th sub-model unit each includes training models that each receive noise level data, an input condition feature, and a supervised signal for an audio signal or image signal corresponding to the input condition feature, and perform training processing so as to output Gaussian white noise from a noise synthesis signal that is a signal obtained by synthesizing the supervised signal and Gaussian white noise based on the noise level data, and wherein the first sub-model unit to the N-th sub-model unit each perform training processing of the training models included in the first sub-model unit to the N-th sub-model unit using noise levels each included in different noise level ranges, thereby obtaining trained models, a control unit that sets a noise schedule, wherein the control unit selects a sub-model unit to be used, in performing signal generation processing, from the first sub-model unit to the N-th sub-model unit according to the noise level determined based on the noise schedule, and determines an order of processing of the sub-model units that have been selected, wherein the selected sub-model units perform prediction processing using the trained model in the order determined by the control unit to obtain an audio signal or an image signal according to the input condition feature.