| CPC G06T 11/60 (2013.01) [G06F 16/532 (2019.01); G06T 11/203 (2013.01); G06F 3/0482 (2013.01); G06T 2200/24 (2013.01); G06T 2207/20084 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving, from a client device, a multi-modal search input for conducting an image search, the multi-modal search input comprising a canvas and one or more sketch query components positioned on the canvas;
generating, using a multi-modal embedding neural network and for a first segment of the multi-modal search input, a first segment-level semantic embedding and a first segment-level layout embedding representing the first segment;
generating, using the multi-modal embedding neural network and for a second segment of the multi-modal search input, a second segment-level semantic embedding and a second segment-level layout embedding representing the second segment;
determining a semantic embedding that incorporates semantic information from the first segment-level semantic embedding and the second segment-level semantic embedding;
determining a layout embedding that incorporates layout information from the first segment-level layout embedding and the second segment-level layout embedding;
generating a unified embedding for the multi-modal search input from the semantic embedding and the layout embedding; and
retrieving one or more digital images utilizing the unified embedding that is responsive to the multi-modal search input.
|