US 12,475,869 B2
	Audio generation using generative artificial intelligence model
Zach Evans, Bellevue, WA (US); Julian Parker, London (GB); CJ Carr, Medford, MA (US); Zachary Zukowski, Sacramento, CA (US); Josiah Taylor, Stafford (GB); and Jordi Pons, Sabadell (ES)
Assigned to Stability AI Ltd, London (GB)
Filed by Stability AI Ltd, London (GB)
Filed on May 22, 2025, as Appl. No. 19/216,171.
Application 19/216,171 is a continuation of application No. 18/883,212, filed on Sep. 12, 2024.
Claims priority of provisional application 63/633,019, filed on Apr. 11, 2024.
Prior Publication US 2025/0322817 A1, Oct. 16, 2025
Int. Cl. G10H 1/00 (2006.01)

CPC G10H 1/0025 (2013.01) [G10H 2250/145 (2013.01); G10H 2250/215 (2013.01); G10H 2250/311 (2013.01); G10H 2250/571 (2013.01); G10H 2250/631 (2013.01)]

20 Claims

1. A system comprising:

one or more storage media storing instructions; and

one or more processors configured to execute the instructions to cause the system to:

receive a prompt describing desired characteristics of audio;

generate, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz; and

generate, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate of at least 40 kHz and including the audio based on the latent space representation of the audio, the audio having a length greater than 90 seconds.