| CPC G10L 25/78 (2013.01) [G10L 21/10 (2013.01)] | 9 Claims |

|
1. A method for detecting a synthetic voice based on a biological sound comprising:
receiving an audio stream;
extracting a biological feature vector corresponding to a meaningless voice from the audio stream;
extracting a synthetic voice feature vector from the audio stream;
combining the biological feature vector and the synthetic voice feature vector to generate a combined feature vector; and
determining whether the audio stream is a synthetic voice based on the combined feature vector,
wherein extracting the biological feature vector comprises extracting the biological feature vector by inputting the audio stream to a pre-trained biological sound segmentation model, encoding the biological feature vector using a sequence model to extract encoded data corresponding to the last hidden state of the sequence model, and converting the encoded data into a scoring embedding vector of length H through a fully connected layer without an activation function,
wherein the biological sound segmentation model extracts the biological feature vector by converting the audio stream into a spectrogram, dividing the spectrogram into a plurality of frames, classifying a biological sound type for each divided frame, and assigning a corresponding ID to each classified biological sound type.
|