US 12,135,747 B2
Cross-modal search systems and methods
Diane Larlus, La Tronche (FR); Jon Almazan, London (GB); Cesar De Souza, Grenoble (FR); Naila Murray, Grenoble (FR); and Rafael Sampaio De Rezende, Grenoble (FR)
Assigned to NAVER CORPORATION, Gyeonggi-Do (KR)
Filed by NAVER CORPORATION, Gyeonggi-do (KR)
Filed on Jun. 29, 2023, as Appl. No. 18/216,220.
Application 18/216,220 is a division of application No. 16/791,368, filed on Feb. 14, 2020, granted, now 11,734,352.
Prior Publication US 2023/0350951 A1, Nov. 2, 2023
Int. Cl. G06F 16/9032 (2019.01); G06F 16/9038 (2019.01); G06F 17/16 (2006.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06V 10/74 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)
CPC G06F 16/9032 (2019.01) [G06F 16/90332 (2019.01); G06F 16/9038 (2019.01); G06F 17/16 (2013.01); G06F 18/2148 (2023.01); G06F 18/22 (2023.01); G06V 10/761 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A system for training a cross-modal search system, comprising:
a training dataset including first objects of a first modality and second objects of a second modality that are associated with the first objects, respectively,
wherein the first modality is different than the second modality, and
wherein the second objects include text that is descriptive of the first objects;
a first matrix including first relevance values indicative of relevance between the first objects and the second objects, respectively;
a second matrix including second relevance values indicative of relevance between the second objects and the first objects, respectively; and
a training module configured to:
split an embedding space into a plurality of bins, each of the bins corresponding to a predetermined range of distances from a query encoded into the embedding space;
assign ones of the second objects to ones of the bins based on distances between the second ones of the objects and a query encoded into the embedding space;
determine a ranking measure based on a number of the ones of the second objects assigned to the bins;
determine losses based on the ranking measure and the first and second matrices;
determine a final loss based on the losses;
train embedding functions based on the final loss; and
store the embedding functions in memory of a search module for cross-modal retrieval in response to receipt of search queries.