CPC G06V 30/153 (2022.01) [G06N 3/08 (2013.01); G06V 20/00 (2022.01); G10L 15/00 (2013.01); G06V 30/10 (2022.01)] | 14 Claims |
1. An electronic device comprising:
a communication interface comprising circuitry;
a memory storing at least one instruction; and
a processor configured to execute the at least one instruction,
wherein the processor is configured to:
obtain a content via the communication interface,
obtain information on a text included in an image of the content,
obtain caption data of the content by performing speech recognition for speech data included in the content based on the information on the text included in the image of the content, and
perform the speech recognition for the speech data by applying a weight to each of an appearance time of the text, an appearance position of the text and a size of the text included in the image of the content obtained by analyzing image data included in the content.
|