US 12,073,639 B2
Image description generation method, apparatus and system, and medium and electronic device
Yingwei Pan, Beijing (CN); Yehao Li, Beijing (CN); Ting Yao, Beijing (CN); and Tao Mei, Beijing (CN)
Assigned to BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY CO., LTD., Beijing (CN); and BEIJING JINGDONG CENTURY TRADING CO., LTD., Beijing (CN)
Appl. No. 17/754,601
Filed by BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY CO., LTD., Beijing (CN); and BEIJING JINGDONG CENTURY TRADING CO., LTD., Beijing (CN)
PCT Filed Mar. 2, 2021, PCT No. PCT/CN2021/078673
§ 371(c)(1), (2) Date Apr. 7, 2022,
PCT Pub. No. WO2021/190257, PCT Pub. Date Sep. 30, 2021.
Claims priority of application No. 202010231097.2 (CN), filed on Mar. 27, 2020.
Prior Publication US 2023/0014105 A1, Jan. 19, 2023
Int. Cl. G06V 20/70 (2022.01); G06T 7/70 (2017.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01); G06V 10/46 (2022.01); G06V 10/80 (2022.01)
CPC G06V 20/70 (2022.01) [G06T 7/70 (2017.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01); G06V 10/462 (2022.01); G06V 10/806 (2022.01); G06V 2201/07 (2022.01)] 18 Claims
OG exemplary drawing
 
1. An image description generation method, comprising:
acquiring one or more image region features in a target image, and obtaining a current input vector by performing a mean pooling on the image region features;
obtaining respective outer product vectors of the image region features by respectively linearly fusing the current input vector and each of the image region features;
calculating, based on the respective outer product vectors of the image region features, an attention distribution of the image region features in a spatial dimension and an attention distribution of the image region features in a channel dimension; and
generating an image description of the target image based on the attention distribution of the image region features in the spatial dimension and the attention distribution of the image region features in the channel dimension.