CPC G10L 25/51 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); H04L 65/75 (2022.05)] | 20 Claims |
1. A method, comprising:
at a first electronic device, the first electronic device having one or more processors and memory storing instructions for execution by the one or more processors:
receiving a first audio content item that includes a plurality of sound sources;
generating a representation of the first audio content item; and
determining, from the representation of the first audio content item:
a representation of an isolated sound source, and
frequency data associated with the isolated sound source, wherein the step of determining the representation of the isolated sound source and the frequency data associated with the isolated sound source includes using a neural network system to jointly determine the representation of the isolated sound source using a first neural network and the frequency data associated with the isolated sound source using a second neural network, wherein weights of the first neural network and weights of the second neural network are trained simultaneously; and
determining that a portion of a second audio content item matches the first audio content item using the representation of the isolated sound source or the frequency data associated with the isolated sound source.
|