US 12,106,740 B2
Supervised metric learning for music structure features
Ju-Chiang Wang, Los Angeles, CA (US); Jordan Smith, London (GB); and Wei Tsung Lu, Los Angeles, CA (US)
Assigned to Lemon Inc., Grand Cayman (KY)
Filed by Lemon Inc., Grand Cayman (KY)
Filed on Oct. 15, 2021, as Appl. No. 17/502,890.
Prior Publication US 2023/0121764 A1, Apr. 20, 2023
Int. Cl. G06F 17/00 (2019.01); G06N 3/08 (2023.01); G10H 1/00 (2006.01)
CPC G10H 1/0008 (2013.01) [G06N 3/08 (2013.01); G10H 2210/076 (2013.01); G10H 2250/311 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for implementing supervised metric learning during a training of a deep neural network model, the method comprising:
implementing a deep neural network model configured to receive a song and output embeddings representing the song; and
implementing a music structure analysis framework configured to receive the embeddings, segment the embeddings, and detect repeated portions of the song,
wherein a training of the deep neural network model is implemented by supervised metric learning comprising:
receiving audio input including a plurality of song fragments from a plurality of songs;
for each song fragment of the plurality of song fragments, determining beat information;
for each song fragment of the plurality of song fragments, performing an aligning function to center the song fragment based on the beat information, applying a windowing function to the song fragment based on the center of the song fragment, the windowing function removing at least some audio context of the song fragment, and thereby creating a plurality of aligned song fragments;
for each song fragment of the plurality of song fragments, obtaining an embedding from the deep neural network model;
selecting a batch of aligned song fragments from the plurality of aligned song fragments, the batch of aligned song fragments being associated with a same song of the plurality of songs;
sampling the selected batch of aligned song fragments and selecting a training tuple;
generating a loss metric based on the selected training tuple; and
updating one or more weights of the deep neural network model based on the loss metric.