US 12,380,896 B1
Audio speech signal analysis for fraud detection
Sushil Ninawe, Mumbai (IN); Jayati Tripathi, Bengaluru (IN); Cheryl Fernandes, Plainsboro, NJ (US); Mehak Mehta, Jersey City, NJ (US); Aratrika Sarkar, Jersey City, NJ (US); and Melissa Kagaju, New York, NY (US)
Assigned to Morgan Stanley Services Group Inc., New York, NY (US)
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Apr. 30, 2025, as Appl. No. 19/194,354.
Application 19/194,354 is a continuation in part of application No. 18/942,728, filed on Nov. 10, 2024, granted, now 12,323,554.
Int. Cl. G10L 21/00 (2013.01); G06Q 20/40 (2012.01); G10L 15/18 (2013.01); G10L 17/26 (2013.01); H04M 3/42 (2006.01); H04M 3/58 (2006.01)
CPC G10L 17/26 (2013.01) [G06Q 20/407 (2013.01); G10L 15/1807 (2013.01); H04M 3/42221 (2013.01); H04M 3/58 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A method for analyzing audio speech signals to detect fraudulent calls to a contact center, the method comprising:
splitting an audio recording of a call in real-time into a foreground speech signal attributed to a main speaker and a background audio signal;
extracting audio features from the foreground speech signal and the background audio signal;
inputting the extracted audio features into an ensemble model, wherein the ensemble model comprises multiple different machine learning models co-trained to cumulatively detect fraud, wherein the multiple different machine learning models include:
a speaker audio model to detect audio speech anomalies in the foreground speech signal attributed by clustering to the main speaker,
a speaker intent model to classify intent of the main speaker in the foreground speech signal using a large language model and call transcription,
a synthetic audio model to detect if the main speaker is real or synthetic, and
a prosody model to detect voice intonation of the main speaker in the foreground speech signal; and
outputting, by the ensemble model, a prediction of whether the call is fraudulent.