US 12,254,890 B2
Audio signal conversion model learning apparatus, audio signal conversion apparatus, audio signal conversion model learning method and program
Takuhiro Kaneko, Musashino (JP); Hirokazu Kameoka, Musashino (JP); Ko Tanaka, Musashino (JP); and Nobukatsu Hojo, Musashino (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 18/017,800
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Jul. 27, 2020, PCT No. PCT/JP2020/028717
§ 371(c)(1), (2) Date Jan. 24, 2023,
PCT Pub. No. WO2022/024183, PCT Pub. Date Feb. 3, 2022.
Prior Publication US 2023/0274751 A1, Aug. 31, 2023
Int. Cl. G10L 21/00 (2013.01)
CPC G10L 21/00 (2013.01) 7 Claims
OG exemplary drawing
 
1. A voice signal conversion model learning device comprising:
a processor; and
a storage medium having computer program instructions stored thereon, wherein the computer program instruction, when executed by the processor, perform processing of:
executing generation processing of generating a conversion destination voice signal on the basis of an input voice signal that is a voice signal of an input voice, conversion source attribute information that is information indicating an attribute of an input voice that is a voice represented by the input voice signal, and conversion destination attribute information indicating an attribute of a voice represented by the conversion destination voice signal that is a voice signal of a conversion destination of the input voice signal; and
executing estimation processing of estimating whether or not a voice signal that is a processing target is a voice signal representing a vocal sound actually uttered by a person on the basis of the conversion source attribute information and the conversion destination attribute information, wherein
the conversion destination voice signal is input to the processing of execution of generation processing,
the processing target is a voice signal input to the processing of execution of generation processing, and
the processing of execution of generation processing and the processing of execution of voice estimation processing are learned on the basis of an estimation result of the voice estimation processing.