CPC G10H 1/0008 (2013.01) [G06N 3/08 (2013.01); G10H 2210/076 (2013.01); G10H 2250/311 (2013.01)] | 20 Claims |
1. A method for implementing supervised metric learning during a training of a deep neural network model, the method comprising:
implementing a deep neural network model configured to receive a song and output embeddings representing the song; and
implementing a music structure analysis framework configured to receive the embeddings, segment the embeddings, and detect repeated portions of the song,
wherein a training of the deep neural network model is implemented by supervised metric learning comprising:
receiving audio input including a plurality of song fragments from a plurality of songs;
for each song fragment of the plurality of song fragments, determining beat information;
for each song fragment of the plurality of song fragments, performing an aligning function to center the song fragment based on the beat information, applying a windowing function to the song fragment based on the center of the song fragment, the windowing function removing at least some audio context of the song fragment, and thereby creating a plurality of aligned song fragments;
for each song fragment of the plurality of song fragments, obtaining an embedding from the deep neural network model;
selecting a batch of aligned song fragments from the plurality of aligned song fragments, the batch of aligned song fragments being associated with a same song of the plurality of songs;
sampling the selected batch of aligned song fragments and selecting a training tuple;
generating a loss metric based on the selected training tuple; and
updating one or more weights of the deep neural network model based on the loss metric.
|