| CPC G10H 1/0025 (2013.01) [G10H 2210/111 (2013.01); G10H 2250/311 (2013.01)] | 14 Claims |

|
1. A method for configuring a learning model for music generation, the method comprising:
providing a masked autoencoder which executes on a computing device;
providing an omnidirectional latent diffusion model which executes on a computing device, and which is operatively coupled to the masked autoencoder to process latent embeddings produced by the masked autoencoder;
training the masked autoencoder with training data, the training data including a combination of a reconstruction loss over time and frequency domains, and a patch-based adversarial objective operating at different resolutions, including processing of first training data with the masked autoencoder, applying a first loss function to the results of the processing of the first training data by the masked autoencoder, and adjusting parameters of the masked autoencoder in accordance with the loss function;
configuring a pretrained diffusion model by training the omnidirectional latent diffusion model based on music data represented in a latent space to obtain a pretrained diffusion model, including processing of second training data with the omnidirectional latent diffusion model, applying a second loss function to the results of the processing of the second training data by the omnidirectional latent diffusion model, and adjusting parameters of the omnidirectional latent diffusion model in accordance with the loss function;
fine-tuning the pretrained diffusion model based on text-guided music generation;
fine-tuning the pretrained diffusion model based on bidirectional music in-painting; and
fine-tuning the pretrained diffusion model based on unidirectional music continuation.
|