US 11,929,086 B2
	Systems and methods for audio source separation via multi-scale feature learning
Vivek Sivaraman Narayanaswamy, Tempe, AZ (US); Andreas Spanias, Tempe, AZ (US); Jayaraman Thiagarajan, Dublin, CA (US); and Sameeksha Katoch, Tempe, AZ (US)
Assigned to Arizona Board of Regents on Behalf of Arizona State University, Tempe, AZ (US); and Lawrence Livermore National Security, LLC, Livermore, CA (US)
Filed by Vivek Sivaraman Narayanaswamy, Tempe, AZ (US); Andreas Spanias, Tempe, AZ (US); Jayaraman Thiagarajan, Dublin, CA (US); and Sameeksha Katoch, Tempe, AZ (US)
Filed on Dec. 14, 2020, as Appl. No. 17/121,131.
Claims priority of provisional application 62/947,871, filed on Dec. 13, 2019.
Prior Publication US 2021/0183401 A1, Jun. 17, 2021
Int. Cl. G10L 21/0308 (2013.01); G06F 16/635 (2019.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); G10L 25/30 (2013.01)

CPC G10L 21/0308 (2013.01) [G06F 16/635 (2019.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G10L 25/30 (2013.01)]

18 Claims

1. A system, comprising:

a computer-implemented neural-network based architecture, including:

a downstream path configured to receive an input mixture, the downstream path comprising:

a plurality of downstream convolutional blocks configured to learn a plurality of features of the input mixture, wherein each downstream convolutional block of the plurality of downstream convolutional blocks includes a plurality of downstream convolutional layers having exponentially varying dilation rates associated with each respective upstream convolutional layer of the plurality of downstream convolutional layers;

wherein a first convolutional layer of the first downstream convolutional block directly receives the input mixture; and

an upstream path in communication with the downstream path, the upstream path configured to output a plurality of source waveforms associated with the input mixture, the upstream path comprising:

a plurality of upstream convolutional blocks configured to learn a plurality of features of the input mixture, wherein each upstream convolutional block of the plurality of upstream convolutional blocks includes a plurality of upstream convolutional layers having exponentially varying dilation rates associated with each respective upstream convolutional layer of the plurality of upstream convolutional layers;

the input mixture being connected directly to a final convolutional layer by a skip connection;

wherein the plurality of upstream convolutional blocks are transposed relative to the plurality of downstream convolutional blocks.