| CPC G10L 17/14 (2013.01) [G10L 17/22 (2013.01); G10L 21/028 (2013.01)] | 8 Claims |

|
1. A speaker diarization correction method of a computer apparatus comprising at least one processor, the method, which uses the at least one processor, comprising:
performing speaker diarization on an input audio stream;
recognizing a speech included in the input audio stream and converting the speech to text;
detecting a speaker change based on the converted text; and
correcting the speaker diarization based on the detected speaker change,
wherein the detecting of the speaker change comprises:
receiving a speech recognition result for each utterance section, wherein each utterance section consists of at least one word unit, and further wherein each word unit comprises a single word of text;
encoding text included in the speech recognition result for each utterance section to one or more word units of text, wherein the encoding of the text to the one or more word units of text comprises encoding an EndPoint Detection (EPD) unit text included in the speech recognition result for each utterance section to the one or more word units of text using sentence Bidirectional Encoder Representations from Transformers (sBERT);
encoding each of the word units of text to consider a conversation context; and
determining whether a speaker change compared to a previous word unit of text is present for each word unit of text, individually, in which the conversation context is considered.
|