US 12,033,644 B2
Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
Parag Chordia, Los Altos Hills, CA (US); Mark Godfrey, Atlanta, GA (US); Alexander Rae, Atlanta, GA (US); Prerna Gupta, Los Altos Hills, CA (US); and Perry R. Cook, Jacksonville, OR (US)
Assigned to SMULE, INC., San Francisco, CA (US)
Filed by SMULE, INC., San Francisco, CA (US)
Filed on Sep. 20, 2021, as Appl. No. 17/479,912.
Application 17/479,912 is a continuation of application No. 16/410,500, filed on May 13, 2019, granted, now 11,127,407.
Application 16/410,500 is a continuation of application No. 15/606,111, filed on May 26, 2017, granted, now 10,290,307, issued on May 14, 2019.
Application 15/606,111 is a continuation of application No. 13/910,949, filed on Jun. 5, 2013, granted, now 9,666,199, issued on May 30, 2017.
Application 13/910,949 is a continuation of application No. PCT/US2013/034678, filed on Mar. 29, 2013.
Application 13/910,949 is a continuation of application No. 13/853,759, filed on Mar. 29, 2013, granted, now 9,324,330, issued on Apr. 26, 2016.
Claims priority of provisional application 61/617,643, filed on Mar. 29, 2012.
Prior Publication US 2022/0180879 A1, Jun. 9, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/04 (2013.01); G10H 1/36 (2006.01); G10L 19/00 (2013.01); G10L 19/02 (2013.01); G10L 21/01 (2013.01); G10L 21/055 (2013.01)
CPC G10L 19/02 (2013.01) [G10H 1/366 (2013.01); G10L 19/00 (2013.01); G10L 21/055 (2013.01); G10H 2210/051 (2013.01); G10H 2240/141 (2013.01); G10H 2250/235 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computational method for transforming an input audio encoding of speech into an output that is rhythmically consistent with a target song, the method comprising:
temporally aligning successive, time-ordered ones of plural segments of the input audio encoding with respective successive pulses of a rhythmic skeleton for the target song;
temporally stretching or compressing at least some of the temporally aligned segments to substantially fill available temporal space between respective ones of the successive pulses of the rhythmic skeleton, wherein the temporal stretching or compressing is performed at rates that vary for respective ones of the temporally aligned segments in accord with respective ratios of segment length to temporal space to be filled;
padding with silence at least one segment of the temporally aligned segments to substantially fill available temporal space of the at least one segment; and
preparing a resultant audio encoding of the speech in correspondence with the temporally aligned, stretched or compressed segments of the input audio encoding.