CPC G10L 15/005 (2013.01) [G06F 16/686 (2019.01); G06F 40/263 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 21/14 (2013.01); G10L 25/18 (2013.01); G10L 25/81 (2013.01)] | 17 Claims |
1. A method of recognizing languages in music by a machine learning model, comprising:
receiving at least one of audio data of a piece of music or metadata associated with the piece of music by the machine learning model trained to recognize the languages in music, wherein the machine learning model is pre-trained using training data, the training data comprise information indicative of audio data representative of a plurality of music samples and metadata associated with the plurality of music samples, and the training data further comprise information indicating a language corresponding to each of the plurality of music samples, wherein the machine learning model is pre-trained to directly associate audio data and metadata of particular pieces of music with different languages, wherein receiving the audio data of the piece of music comprises generating an image representative of the audio data of the piece of music by preprocessing the audio data of the piece of music, and wherein the image representative of the audio data of the piece of music comprises a spectrogram representative of how a frequency of an audio signal of the piece of music varies with time; and
determining a language associated with the piece of music based on the at least one of the audio data of the piece of music or the metadata associated with the piece of music by the machine learning model.
|