US 12,437,744 B2
	Text-to-speech from media content item snippets
Rohit Kumar, Austin, TX (US); Henrik Lindström, Stockholm (SE); Henriette Cramer, San Francisco, CA (US); Sarah Mennicken, San Francisco, CA (US); Sravana Reddy, Cambridge, MA (US); and Jennifer Thom-Santelli, Boston, MA (US)
Assigned to Spotify AB, Stockholm (SE)
Filed by Spotify AB, Stockholm (SE)
Filed on May 1, 2023, as Appl. No. 18/310,136.
Application 18/310,136 is a continuation of application No. 17/146,804, filed on Jan. 12, 2021, granted, now 11,710,474.
Application 17/146,804 is a continuation of application No. 16/235,776, filed on Dec. 28, 2018, granted, now 11,114,085, issued on Sep. 7, 2021.
Prior Publication US 2023/0267912 A1, Aug. 24, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 13/00 (2006.01); G06F 16/683 (2019.01); G10L 13/04 (2013.01)

CPC G10L 13/00 (2013.01) [G06F 16/685 (2019.01); G10L 13/04 (2013.01)]

20 Claims

1. A method for providing text-to-speech functionality, comprising:

splitting an input text into a first text set and a second text set, wherein the first text set is selected for the splitting based on the first text set being associated with suitable lyrics or a suitable lyrical characteristic matched by a key value pair in a dictionary structure, and wherein the second text set is selected for the splitting based on the second text set being suitable for speech synthesis;

extracting an audio snippet from a track corresponding to the first text set, wherein the audio snippet is a portion of the track that begins at a start time within the track and ends at an end time within the track;

creating a synthesized utterance of at least one word of the second text set;

concatenating the audio snippet and the synthesized utterance to form combined audio; and

providing an audio output that includes the combined audio.