US 12,216,703 B2
	Visual search determination for text-to-image replacement
Harshit Kharbanda, Pleasanton, CA (US); Christopher James Kelley, Orinda, CA (US); and Pendar Yousefi, Sunnyvale, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 18, 2022, as Appl. No. 17/968,430.
Prior Publication US 2024/0126807 A1, Apr. 18, 2024
Int. Cl. G06F 16/532 (2019.01); G06F 16/538 (2019.01); G06F 16/54 (2019.01)

CPC G06F 16/532 (2019.01) [G06F 16/538 (2019.01); G06F 16/54 (2019.01)]

20 Claims

1. A computer-implemented method for multimodal searching, the method comprising:

obtaining, by a computing system comprising one or more processors, a search query, wherein the search query comprises one or more words and one or more additional words, wherein the one or more words comprise one or more visually descriptive terms, and wherein the one or more additional words are associated with a different descriptive aspect of the search query than the one or more words;

processing, by the computing system, the search query with a machine-learned model to determine the one or more words comprise a visual intent, wherein the visual intent is associated with one or more visual features;

in response to determining the one or more words comprise a visual intent, providing, by the computing system, an image-selection interface for display, wherein the image-selection interface comprises a plurality of images for selection, wherein the image-selection interface is provided for display based on the determination of the one or more words comprising the visual intent;

obtaining, by the computing system, selection data, wherein the selection data is descriptive of a selection of an image;

replacing, by the computing system, the one or more words with the image;

providing, by the computing system, the image for display as replacement for the one or more words;

determining, by the computing system, one or more search results associated with the one or more additional words and the one or more visual features of the image; and

providing, by the computing system, the one or more search results as an output.