CPC G10L 13/08 (2013.01) [G10L 15/05 (2013.01); G10L 15/16 (2013.01); G10L 15/187 (2013.01); G10L 2013/083 (2013.01)] | 19 Claims |
1. A computer-implemented method for processing text data for synthesized speech, the method comprising:
receiving input data representing a number corresponding to a first pronunciation and to a second pronunciation;
processing the input data to determine first segment data corresponding to a first portion of the number;
processing the input data to determine second segment data corresponding to a second portion of the number;
processing, using a neural network attention layer of an encoder, the first segment data to determine first embedding data representing the first segment data and a first context of the first segment data with respect to the input data;
processing, using the neural network attention layer, the second segment data to determine second embedding data representing the second segment data and a second context of the second segment data with respect to the input data;
processing, using a decoder, the first embedding data to determine first category data indicating that the first embedding data corresponds to a first category associated with the first pronunciation, wherein processing the first embedding data comprises processing, using a first component, the first embedding data;
processing the second embedding data to determine second category data indicating that the second embedding data corresponds to a second category corresponding to the second pronunciation; and
processing the first category data and the second category data to determine output data representing at least a first word corresponding to one of the first pronunciation and the second pronunciation.
|