| CPC G10H 1/0025 (2013.01) [G10H 2250/145 (2013.01); G10H 2250/215 (2013.01); G10H 2250/311 (2013.01); G10H 2250/571 (2013.01); G10H 2250/631 (2013.01)] | 20 Claims |

|
1. A system comprising:
one or more storage media storing instructions; and
one or more processors configured to execute the instructions to cause the system to:
receive a prompt describing desired characteristics of audio;
generate, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz; and
generate, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate of at least 40 kHz and including the audio based on the latent space representation of the audio, the audio having a length greater than 90 seconds.
|