| CPC G10L 13/00 (2013.01) [G06F 16/685 (2019.01); G10L 13/04 (2013.01)] | 20 Claims |

|
1. A method for providing text-to-speech functionality, comprising:
splitting an input text into a first text set and a second text set, wherein the first text set is selected for the splitting based on the first text set being associated with suitable lyrics or a suitable lyrical characteristic matched by a key value pair in a dictionary structure, and wherein the second text set is selected for the splitting based on the second text set being suitable for speech synthesis;
extracting an audio snippet from a track corresponding to the first text set, wherein the audio snippet is a portion of the track that begins at a start time within the track and ends at an end time within the track;
creating a synthesized utterance of at least one word of the second text set;
concatenating the audio snippet and the synthesized utterance to form combined audio; and
providing an audio output that includes the combined audio.
|