CPC G10L 25/78 (2013.01) [G10L 19/0204 (2013.01); G10L 25/18 (2013.01); H04S 7/00 (2013.01); G10L 2025/783 (2013.01)] | 20 Claims |
8. An electronic device for enhancing detection of synthetic voice data comprising:
a processor; and
a memory configured to store data, said electronic device being associated with a network and said memory being in communication with said processor and having instructions stored thereon which, when read and executed by said processor, cause said electronic device to:
convert monophonic voice data into stereophonic voice date, the stereophonic voice data comprising a first channel signal and a second channel signal;
decompose, by a trained machine learning model operated by said electronic device, the stereophonic voice data into a mid-signal and a side signal, the side signal representing a difference between the first and second channel signals;
analyze the side signal to detect structured artifacts associated with synthetic voice generation, the structured artifacts being detected based on deviations from expected patterns in natural human speech;
conduct a spectral analysis of the side signal to detect secondary artifacts, the secondary artifacts include frequency components or modulations uncharacteristic of human speech; determine artifacts indicative of synthetic generation in the structured and secondary artifacts;
calculate, based on the determined artifacts, a probability score reflecting the likelihood the monophonic voice data was synthetically generated;
compare the probability score against a threshold value; and
in response to determining the probability score satisfies the threshold value, determine there is a high likelihood that the monophonic voice data includes synthetic artifacts and generating an alert indicating the monophonic voice data is potentially fraudulent.
|