CPC G10L 15/183 (2013.01) [G06V 10/255 (2022.01); G10L 15/26 (2013.01); H04N 21/4884 (2013.01)] | 15 Claims |
1. An electronic apparatus comprising:
a communication interface configured to receive content comprising image data and speech data;
a memory configured to store a language contextual model trained with relevance between words;
a display; and
a processor configured to:
extract an object and a character included in the image data,
identify an object name of the object and the character,
generate a bias keyword list comprising an image-related word that is associated with the image data, based on the identified object name and the identified character,
convert the speech data to a text based on the bias keyword list and the language contextual model, and
control the display to display the text that is converted from the speech data, as a caption.
|