CPC G06F 16/5846 (2019.01) [G06F 16/353 (2019.01); G06F 40/30 (2020.01)] | 20 Claims |
1. A method for training an image-text mutual retrieval model, comprising:
acquiring training data pairs, wherein the training data pairs comprise text training data and image training data, the text training data comprises long text data, the long text data is text data containing a plurality of target texts, and the target text is a sentence or a phrase;
inputting the training data pairs into an initial model, extracting text coding features of the text training data by using a text coding module in the initial model, and extracting image coding features of the image training data by using an image coding module in the initial model, respectively, wherein the text coding module comprises multi-layer Long-Short Term Memory (LSTM) networks, the multi-layer LSTM networks comprising a first LSTM network layer and a second LSTM network layer, the first LSTM network layer being configured to acquire a feature of each target text based on a feature of each word in each target text, and the second LSTM network layer being configured to acquire a feature of the long text data based on the feature of each target text;
calculating a training loss based on the text coding features and the image coding features, and performing parameter adjustment on the initial model based on the training loss; and
in response to the training loss meeting a convergence condition, determining the initial model after the parameter adjustment as the image-text mutual retrieval model.
|