US 12,131,750 B1
	Methods and systems for enhancing the detection of synthetic voice data
Raphael A. Rodriguez, Marco Island, FL (US); Olena Mizynchuk, Marco Island, FL (US); and Davyd Mizynchuk, Kyiv (UA)
Assigned to Daon Technology, Douglas (IM)
Filed by Daon Technology, Douglas (IM)
Filed on May 10, 2024, as Appl. No. 18/660,784.
Int. Cl. G10L 25/78 (2013.01); G10L 17/00 (2013.01); G10L 19/02 (2013.01); G10L 25/18 (2013.01); H04S 7/00 (2006.01)

CPC G10L 25/78 (2013.01) [G10L 19/0204 (2013.01); G10L 25/18 (2013.01); H04S 7/00 (2013.01); G10L 2025/783 (2013.01)]

20 Claims

8. An electronic device for enhancing detection of synthetic voice data comprising:

a processor; and

a memory configured to store data, said electronic device being associated with a network and said memory being in communication with said processor and having instructions stored thereon which, when read and executed by said processor, cause said electronic device to:

convert monophonic voice data into stereophonic voice date, the stereophonic voice data comprising a first channel signal and a second channel signal;

decompose, by a trained machine learning model operated by said electronic device, the stereophonic voice data into a mid-signal and a side signal, the side signal representing a difference between the first and second channel signals;

analyze the side signal to detect structured artifacts associated with synthetic voice generation, the structured artifacts being detected based on deviations from expected patterns in natural human speech;

conduct a spectral analysis of the side signal to detect secondary artifacts, the secondary artifacts include frequency components or modulations uncharacteristic of human speech; determine artifacts indicative of synthetic generation in the structured and secondary artifacts;

calculate, based on the determined artifacts, a probability score reflecting the likelihood the monophonic voice data was synthetically generated;

compare the probability score against a threshold value; and

in response to determining the probability score satisfies the threshold value, determine there is a high likelihood that the monophonic voice data includes synthetic artifacts and generating an alert indicating the monophonic voice data is potentially fraudulent.