CPC G10L 15/26 (2013.01) [G10L 17/00 (2013.01); G10L 19/038 (2013.01); G10L 15/02 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] | 18 Claims |
1. A method of performing diarization on a sound recording, the method comprising:
receiving a sound recording;
breaking the sound recording into a plurality of chunks;
performing a first diarization on the plurality of chunks, wherein the performing the first diarization on the plurality of chunks occurs simultaneously, and wherein the performing includes breaking each of the plurality of chunks into a plurality of segments, for each of the plurality of segments generating statistical speaker information descriptive of the sound characteristics in that segment, and clustering, within each chunk of the plurality of chunks, segments having similar statistical speaker information to generate within each chunk of the plurality of chunks groups of segments grouped according to the similar statistical speaker information;
performing a second diarization by clustering between the plurality of chunks, the groups of segments according to grouped similar statistical speaker information, the grouped similar statistical speaker information being characteristics of speech of each group for the groups of segments, wherein the second diarization performs a modified I-Vector scoring, based on the groups of segments according to grouped similar statistical speaker information, I-vectors of the groups of segments according to grouped similar statistical speaker information are averaged and then compared to other averaged I-vectors, where a closeness of two or more averaged I-vectors is compared accordingly clustered based on similarity;
creating a new i-vector for the groups of segments according to grouped similar statistical speaker information.
|