US 12,405,995 B2
Using large generative models with improved grounding to improve image context queries
Vishrav Chaudhary, Covington, WA (US); Bradley Moore Abrams, Palo Alto, CA (US); Kamal Ginotra, Kirkland, WA (US); Owais Khan Mohammed, Bellevue, WA (US); Barun Patra, Vancouver (CA); and Michael Lawrence Valenzuela, Yelm, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Dec. 4, 2023, as Appl. No. 18/527,923.
Prior Publication US 2025/0181631 A1, Jun. 5, 2025
Int. Cl. G06F 16/532 (2019.01); G06F 16/242 (2019.01)
CPC G06F 16/532 (2019.01) [G06F 16/243 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for providing text responses to image-based queries:
based on receiving an input image and a natural language query corresponding to the input image, obtaining reverse image search grounding information for the input image;
providing a comprehensive image prompt and the input image to a visual-based large generative model to generate visual image grounding information;
generating a text response to the natural language query corresponding to the input image using a large generative language model based at least in part on the reverse image search grounding information and the visual image grounding information; and
providing the text response in response to the natural language query.