| CPC G10L 17/26 (2013.01) [G06Q 20/407 (2013.01); G10L 15/1807 (2013.01); H04M 3/42221 (2013.01); H04M 3/58 (2013.01)] | 22 Claims |

|
1. A method for analyzing audio speech signals to detect fraudulent calls to a contact center, the method comprising:
splitting an audio recording of a call in real-time into a foreground speech signal attributed to a main speaker and a background audio signal;
extracting audio features from the foreground speech signal and the background audio signal;
inputting the extracted audio features into an ensemble model, wherein the ensemble model comprises multiple different machine learning models co-trained to cumulatively detect fraud, wherein the multiple different machine learning models include:
a speaker audio model to detect audio speech anomalies in the foreground speech signal attributed by clustering to the main speaker,
a speaker intent model to classify intent of the main speaker in the foreground speech signal using a large language model and call transcription,
a synthetic audio model to detect if the main speaker is real or synthetic, and
a prosody model to detect voice intonation of the main speaker in the foreground speech signal; and
outputting, by the ensemble model, a prediction of whether the call is fraudulent.
|