CPC G06F 16/9032 (2019.01) [G06F 16/90332 (2019.01); G06F 16/9038 (2019.01); G06F 17/16 (2013.01); G06F 18/2148 (2023.01); G06F 18/22 (2023.01); G06V 10/761 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)] | 20 Claims |
1. A system for training a cross-modal search system, comprising:
a training dataset including first objects of a first modality and second objects of a second modality that are associated with the first objects, respectively,
wherein the first modality is different than the second modality, and
wherein the second objects include text that is descriptive of the first objects;
a first matrix including first relevance values indicative of relevance between the first objects and the second objects, respectively;
a second matrix including second relevance values indicative of relevance between the second objects and the first objects, respectively; and
a training module configured to:
split an embedding space into a plurality of bins, each of the bins corresponding to a predetermined range of distances from a query encoded into the embedding space;
assign ones of the second objects to ones of the bins based on distances between the second ones of the objects and a query encoded into the embedding space;
determine a ranking measure based on a number of the ones of the second objects assigned to the bins;
determine losses based on the ranking measure and the first and second matrices;
determine a final loss based on the losses;
train embedding functions based on the final loss; and
store the embedding functions in memory of a search module for cross-modal retrieval in response to receipt of search queries.
|