| CPC G06V 20/70 (2022.01) [G06F 40/44 (2020.01); G06T 7/90 (2017.01); G06V 10/44 (2022.01); G06N 3/02 (2013.01); G06T 2207/10024 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06V 10/768 (2022.01); G06V 2201/07 (2022.01)] | 7 Claims |

|
1. An apparatus for automatically generating an image caption, the apparatus comprising:
an automatic caption generation module configured to generate a caption by applying a deep learning algorithm to an image received from a client;
a caption basis generation module configured to generate a basis for the caption by mapping a partial area in the image received from the client with respect to important words in the caption received from the automatic caption generation module; and
a visualization module configured to visualize the caption received from the automatic caption generation module and the basis for the caption received from the caption basis generation module to return the visualized caption and basis to the client,
wherein the caption basis generation module includes:
an object recognition module configured to recognize one or more objects included in the image received from the client and extract one or more object areas;
an image area-word mapping module configured to train a relevance between words in the caption generated by the automatic caption generation module and each of the object areas extracted by the object recognition module using a deep learning algorithm, and output a weight matrix as a result of the training; and
an interpretation reinforcement module configured to extract a word having a highest weight for each object area from the weight matrix received from the image area-word mapping module, and calculate a posterior probability for each word.
|