US 12,323,554 B1
	Audio speech signal analysis for fraud detection
Cheryl Fernandes, Plainsboro, NJ (US); Mehak Mehta, Jersey City, NJ (US); Aratrika Sarkar, Jersey City, NJ (US); and Melissa Kagaju, New York, NY (US)
Assigned to Morgan Stanley Services Group Inc., New York, NY (US)
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Nov. 10, 2024, as Appl. No. 18/942,728.
Int. Cl. H04M 1/56 (2006.01); G06F 40/40 (2020.01); H04M 3/22 (2006.01); H04M 3/51 (2006.01)

CPC H04M 3/2281 (2013.01) [G06F 40/40 (2020.01); H04M 3/5175 (2013.01)]

20 Claims

1. A method for analyzing audio speech signals to detect fraudulent calls to a contact center, the method comprising:

splitting an audio recording of a call in real-time into a foreground speech signal attributed to a main speaker and a background audio signal;

extracting audio features from the foreground speech signal and the background audio signal;

inputting the extracted audio features into an ensemble model, wherein the ensemble model comprises multiple different machine learning models co-trained to cumulatively detect fraud, wherein the multiple different machine learning models include:

a speaker audio model to detect audio speech anomalies in the foreground speech signal attributed by clustering to the main speaker,

a speaker intent model to classify intent of the main speaker in the foreground speech signal using a large language model and call transcription, and

a prosody model to detect voice intonation of the main speaker in the foreground speech signal; and

outputting, by the ensemble model, a prediction of whether the call is fraudulent.