US 11,862,187 B2
Systems and methods for jointly estimating sound sources and frequencies from audio
Andreas Jansson, New York, NY (US); and Rachel Bittner, New York, NY (US)
Assigned to Spotify AB, Stockholm (SE)
Filed by Spotify AB, Stockholm (SE)
Filed on May 23, 2022, as Appl. No. 17/751,471.
Application 17/751,471 is a continuation of application No. 16/596,554, filed on Oct. 8, 2019, granted, now 11,355,137.
Prior Publication US 2022/0351747 A1, Nov. 3, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 25/51 (2013.01); G06N 20/00 (2019.01); G06N 3/08 (2023.01); H04L 65/75 (2022.01); G06N 3/045 (2023.01)
CPC G10L 25/51 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); H04L 65/75 (2022.05)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
at a first electronic device, the first electronic device having one or more processors and memory storing instructions for execution by the one or more processors:
receiving a first audio content item that includes a plurality of sound sources;
generating a representation of the first audio content item; and
determining, from the representation of the first audio content item:
a representation of an isolated sound source, and
frequency data associated with the isolated sound source, wherein the step of determining the representation of the isolated sound source and the frequency data associated with the isolated sound source includes using a neural network system to jointly determine the representation of the isolated sound source using a first neural network and the frequency data associated with the isolated sound source using a second neural network, wherein weights of the first neural network and weights of the second neural network are trained simultaneously; and
determining that a portion of a second audio content item matches the first audio content item using the representation of the isolated sound source or the frequency data associated with the isolated sound source.