| CPC G10L 21/0308 (2013.01) [G10L 15/063 (2013.01); G10L 15/16 (2013.01)] | 17 Claims |

|
12. A method comprising
training an audio source separation model with a first training dataset to generate a plurality of audio stems from an audio input sample;
receiving a single-track audio input stream comprising a mixture of audio signals generated from a plurality of audio sources;
first separating the audio sources, using the audio source separation model, from the single-track audio input stream in accordance with one or more processing recipes to generate a first plurality of source separated output stems corresponding to one or more of the plurality of audio sources;
retraining the audio source separation model to separate the plurality of audio sources from the single-track audio input stream using, at least in part, a second training dataset comprising at least one of the first plurality of source separated output stems generated from the single-track audio input stream; and
second separating the audio sources, using the retrained audio source separation model, from the single-track audio input stream in accordance with the one or more processing recipes to generate a second plurality of source separated output stems corresponding to the one or more of the plurality of audio sources;
wherein retraining the audio source separation model comprises training a first neural network of the audio source separation model at a first audio sample rate, and a second neural network at a second audio sample rate that is higher than the first audio sample rate.
|