| CPC G06F 16/535 (2019.01) [G06F 16/538 (2019.01); G06N 3/08 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
using a server computer, obtaining from a client computer a text input comprising one or more first unigrams in a query from a user;
accessing in digital data storage coupled to the server computer a plurality of digital images, each of the plurality of digital images comprising one or more definition unigrams;
training a deep learning model to map the one or more first unigrams to first vector representations for the text input and to map the one or more definition unigrams to second vector representations for the plurality of digital images, the deep learning model being a dual encoder model comprising a text encoder and an image encoder based on a ranking loss function;
determining, using the deep learning model, the first vector representations of the text input by mapping the one or more first unigrams of the text input to the first vector representations for the text input;
determining, using the deep learning model, a first embedding of the first vector representations of the text input in a multi-dimensional embedding space based on a combination of the first vector representations of the text input;
determining, using the deep learning model, the second vector representations of each of the plurality of digital images by mapping the one or more definition unigrams of each of the plurality of digital images to the second vector representations for the plurality of digital images;
determining, using the deep learning model, a second embedding of the second vector representations of each of the plurality of digital images in the multi-dimensional embedding space based on a combination of the second vector representations of a corresponding image;
identifying one or more relevant images based on a respective similarity of the first embedding to the second embedding;
determining one or more information terms for each of the one or more relevant images, an image informativeness value for each of the one or more relevant images based on the one or more information terms, and a confidence score for each of the one or more information terms; and
transmitting, to the client computer in response to obtaining the text input, instructions for presenting a user interface comprising the one or more relevant images and the confidence score for each of the one or more information terms for each of the one or more relevant images.
|