US 12,249,116 B2
Concept disambiguation using multimodal embeddings
Venkata Naveen Kumar Yadav Marri, Fremont, CA (US); and Ajinkya Gorakhnath Kale, San Jose, CA (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Mar. 23, 2022, as Appl. No. 17/656,147.
Prior Publication US 2023/0326178 A1, Oct. 12, 2023
Int. Cl. G06V 10/771 (2022.01); G06N 3/088 (2023.01); G06V 10/74 (2022.01); G06V 10/77 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)
CPC G06V 10/761 (2022.01) [G06N 3/088 (2013.01); G06V 10/771 (2022.01); G06V 10/7715 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method for image processing, comprising:
identifying a plurality of candidate concepts in a knowledge graph (KG) that correspond to an image tag of an image, wherein the knowledge graph comprises a plurality of nodes corresponding to the plurality of candidate concepts;
generating an image embedding of the image using a multi-modal encoder;
generating a text embedding for each of the plurality of candidate concepts using the multi-modal encoder used to generate the image embedding, wherein the image embedding and the text embedding are located in a same embedding space;
selecting a matching concept from the plurality of candidate concepts based on the image embedding and the text embedding;
generating association data between the image and the matching concept; and
transmitting information from the knowledge graph corresponding to the image based on the association data between the image and the matching concept.