US 11,837,245 B2
Deep learning segmentation of audio using magnitude spectrogram
Luke Miner, San Francisco, CA (US)
Assigned to AUDIOSHAKE, INC., San Francisco, CA (US)
Filed by Audioshake, Inc., San Francisco, CA (US)
Filed on Nov. 1, 2022, as Appl. No. 18/051,860.
Application 18/051,860 is a continuation of application No. 17/061,799, filed on Oct. 2, 2020, granted, now 11,521,630.
Claims priority of provisional application 62/882,317, filed on Aug. 2, 2019.
Prior Publication US 2023/0093726 A1, Mar. 23, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 19/02 (2013.01); G10L 25/30 (2013.01); G10L 21/0272 (2013.01); G06N 3/08 (2023.01)
CPC G10L 19/0216 (2013.01) [G06N 3/08 (2013.01); G10L 21/0272 (2013.01); G10L 25/30 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for decomposing an audio signal, the method comprising:
transforming an original audio file into a complex spectrogram;
splitting the complex spectrogram into K small fragments along the time dimension;
sending each fragment in the K small fragments through one or more convolutional deep neural networks, the convolutional deep neural networks including one or more convolutional layers, the one or more convolutional layers including a subpixel upsample convolutional layer;
producing a sequence of K mask fragments;
concatenating the K mask fragments together in order to form a complete mask which is the same length as the complex spectrogram;
multiplying the complete mask with the complex spectrogram to create a new complex spectrogram; and
transforming the new complex spectrogram into a new audio file.