CPC G10L 13/033 (2013.01) [G10L 15/00 (2013.01); H04M 3/51 (2013.01)] | 20 Claims |
1. A system for altering speech of a speaker in an audio stream, the system comprising:
one or more memories; and
one or more processors, communicatively coupled to the one or more memories, configured to:
receive an audio stream associated with a call between a user and an individual associated with a call center;
determine, using a model, a first speech characteristic corresponding to an accent characteristic of the individual based on speech in the audio stream associated with the individual, wherein the model is trained to determine an accent characteristic based on:
reference speech data that indicates inflection points associated with an accent characteristic of the user for words of a language and corresponding inflection points associated with the accent characteristic of the individual for the words, and
reference audio data that indicates audio characteristics of respective inflection points of multiple accents associated with the accent characteristic of the user and corresponding audio characteristics of respective inflection points of multiple accents associated with the accent characteristic of the individual;
determine whether the accent characteristic of the user is different from the accent characteristic of the individual based on a comparison of the accent characteristic of the user and the accent characteristic of the individual;
process, using a speech alteration model and to form altered speech, the audio stream to alter the speech from having the first speech characteristic to having a second speech characteristic that corresponds to the accent characteristic of the user, based on determining whether the accent characteristic of the user is different from the accent characteristic of the individual,
wherein the speech alteration model is trained based on the reference audio data and the reference speech data;
replace the speech within a user channel of the audio stream with the altered speech; and
provide, via the user channel, the altered speech to the user to enable the user to listen to the speech associated with the individual according to the accent characteristic of the user.
|