CPC H04N 21/43072 (2020.08) [G10L 13/027 (2013.01); G10L 15/07 (2013.01); G10L 15/19 (2013.01); G10L 25/63 (2013.01); H04N 21/8106 (2013.01)] | 21 Claims |
1. A method comprising:
accessing a media asset, the media asset featuring a speaker that utters a plurality of spoken words;
determining an identity of the speaker in the media asset based on metadata associated with the media asset;
identifying vocal characteristics of the identified speaker by:
searching, based on the identity, for another media asset featuring the speaker;
extracting a voice sample featuring the speaker from the another media asset;
identifying the vocal characteristics based on the voice sample from the another media asset;
determining non-linguistic characteristics of the plurality of spoken words;
determining an emotional state expressed in the media asset featuring the speaker that utters the plurality of spoken words based on the non-linguistic characteristics; and
generating a translation of the plurality of spoken words of the media asset featuring the speaker that utters the plurality of spoken words using the identified vocal characteristics, and the determined emotional state.
|