US 12,254,892 B2
Audio source separation processing workflow systems and methods
Emile de la Rey, Wellington (NZ); and Paris Smaragdis, Urbana, IL (US)
Assigned to WingNut Films Productions Limited, Wellington (NZ)
Filed by WingNut Films Productions Limited, Wellington (NZ)
Filed on Oct. 25, 2022, as Appl. No. 17/973,482.
Application 17/973,482 is a continuation in part of application No. 17/848,341, filed on Jun. 23, 2022.
Claims priority of provisional application 63/272,650, filed on Oct. 27, 2021.
Prior Publication US 2023/0130844 A1, Apr. 27, 2023
Int. Cl. G10L 21/0308 (2013.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01)
CPC G10L 21/0308 (2013.01) [G10L 15/063 (2013.01); G10L 15/16 (2013.01)] 17 Claims
OG exemplary drawing
 
12. A method comprising
training an audio source separation model with a first training dataset to generate a plurality of audio stems from an audio input sample;
receiving a single-track audio input stream comprising a mixture of audio signals generated from a plurality of audio sources;
first separating the audio sources, using the audio source separation model, from the single-track audio input stream in accordance with one or more processing recipes to generate a first plurality of source separated output stems corresponding to one or more of the plurality of audio sources;
retraining the audio source separation model to separate the plurality of audio sources from the single-track audio input stream using, at least in part, a second training dataset comprising at least one of the first plurality of source separated output stems generated from the single-track audio input stream; and
second separating the audio sources, using the retrained audio source separation model, from the single-track audio input stream in accordance with the one or more processing recipes to generate a second plurality of source separated output stems corresponding to the one or more of the plurality of audio sources;
wherein retraining the audio source separation model comprises training a first neural network of the audio source separation model at a first audio sample rate, and a second neural network at a second audio sample rate that is higher than the first audio sample rate.