US 12,271,983 B2
	Generating unified embeddings from multi-modal canvas inputs for image retrieval
Zhifei Zhang, San Jose, CA (US); Zhe Lin, Fremont, CA (US); Scott Cohen, Sunnyvale, CA (US); and Kevin Gary Smith, Lehi, UT (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Jun. 28, 2022, as Appl. No. 17/809,494.
Prior Publication US 2023/0419571 A1, Dec. 28, 2023
Int. Cl. G06T 11/60 (2006.01); G06F 3/0482 (2013.01); G06F 16/532 (2019.01); G06T 11/20 (2006.01)

CPC G06T 11/60 (2013.01) [G06F 16/532 (2019.01); G06T 11/203 (2013.01); G06F 3/0482 (2013.01); G06T 2200/24 (2013.01); G06T 2207/20084 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving, from a client device, a multi-modal search input for conducting an image search, the multi-modal search input comprising a canvas and one or more sketch query components positioned on the canvas;

generating, using a multi-modal embedding neural network and for a first segment of the multi-modal search input, a first segment-level semantic embedding and a first segment-level layout embedding representing the first segment;

generating, using the multi-modal embedding neural network and for a second segment of the multi-modal search input, a second segment-level semantic embedding and a second segment-level layout embedding representing the second segment;

determining a semantic embedding that incorporates semantic information from the first segment-level semantic embedding and the second segment-level semantic embedding;

determining a layout embedding that incorporates layout information from the first segment-level layout embedding and the second segment-level layout embedding;

generating a unified embedding for the multi-modal search input from the semantic embedding and the layout embedding; and

retrieving one or more digital images utilizing the unified embedding that is responsive to the multi-modal search input.