CPC G06V 10/467 (2022.01) [G06N 3/045 (2023.01); G06V 30/418 (2022.01)] | 20 Claims |
1. A difference description statement generation method, comprising:
encoding a target image and target text respectively, and performing feature concatenation on an image encoding feature and a text encoding feature that are obtained by encoding, to obtain a concatenated encoding feature;
inputting the concatenated encoding feature to a preset image-text alignment unit constructed based on a preset self-attention mechanism to perform image-text alignment processing, to obtain a concatenated alignment feature;
splitting the concatenated alignment feature to obtain an image alignment feature and a text alignment feature, and inputting the image alignment feature, the text encoding feature, and the text alignment feature to a preset noise monitoring unit constructed based on the preset self-attention mechanism and a preset cross-attention mechanism to perform processing, to extract a difference signal between the target image and the target text; and
generating a difference description statement based on the difference signal by using a preset difference description generation algorithm.
|