CPC G06V 20/70 (2022.01) [G06T 7/70 (2017.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01); G06V 10/462 (2022.01); G06V 10/806 (2022.01); G06V 2201/07 (2022.01)] | 18 Claims |
1. An image description generation method, comprising:
acquiring one or more image region features in a target image, and obtaining a current input vector by performing a mean pooling on the image region features;
obtaining respective outer product vectors of the image region features by respectively linearly fusing the current input vector and each of the image region features;
calculating, based on the respective outer product vectors of the image region features, an attention distribution of the image region features in a spatial dimension and an attention distribution of the image region features in a channel dimension; and
generating an image description of the target image based on the attention distribution of the image region features in the spatial dimension and the attention distribution of the image region features in the channel dimension.
|