| CPC G10L 15/26 (2013.01) | 9 Claims |

|
1. A method of generating a summary for a sound source data, the method being performed by at least one computing device, the method comprising:
generating a speak score for at least one speaker based on the sound source data, based on a weighted value sum of at least one of a degree of dispersion of the speak or a frequency of the speak of the speaker, wherein the frequency of the speak is determined by:
removing noise from the sound source data using a VAD module, wherein the VAD module receives the sound source and repeats a binary classification algorithm in which, when the speak of the speaker is recognized at a predetermined interval of the sound source, the VAD module outputs a numerical progression and wherein the VAD module further includes a probability distribution-based classification algorithm for determining whether the distribution is similar to the speak or the noise based on the distribution of the speak and the distribution of the noise;
extracting a plurality of feature vectors for the speaks included in the sound source data, wherein the feature vectors represent data points within the latent space that encapsulates the characteristics of the speak;
analyzing each of the plurality of feature vectors for similarities;
clustering each of the plurality of feature vectors based on the analyzed similarities;
distinguishing the plurality of speakers from each other based on the clustered feature vectors corresponding to each of the speaks; and
wherein the degree of dispersion of the speak is determined by:
identifying one or more speak indices for each of the at least one speaker based on the sound source data; and
calculating the degree of dispersion for the speaks of each speaker based on the identified one or more speak indices;
determining a main speaker of the sound source data based on the speak score for said at least one speaker; and
generating the summary for the sound source data in consideration of the determined main speaker.
|