| CPC G10L 15/16 (2013.01) [G10L 2015/025 (2013.01)] | 14 Claims |

|
1. A method of generating voice in a call session, the method comprising:
extracting a plurality of features from a voice input through an artificial neural network (ANN);
identifying one or more lost audio frames within the voice input, wherein the one or more lost audio frames are lost due to at least one of vocal issues or network issues;
predicting by the ANN, for each of the one or more lost audio frames, one or more features of the respective lost audio frame;
superposing the predicted features upon the voice input to generate an updated voice input; and
correcting the updated voice input by:
obtaining a confidence score of the updated voice input;
splitting the updated voice input into a plurality of phonemes based on the confidence score;
identifying one or more non-aligned phonemes out of the plurality of phonemes based on comparing the plurality of phonemes with language vocabulary knowledge;
generating a plurality of variant phonemes; and
updating the identified one or more non-aligned phonemes through one or more of:
replacing the identified one or more non-aligned phonemes with the plurality of variant phonemes;
adding additional phonemes to supplement the identified one or more non-aligned phonemes; or
deleting the identified one or more non-aligned phonemes.
|