US 12,266,065 B1
Visual indicators of generative model response details
Harshit Kharbanda, Pleasanton, CA (US); Louis Wang, San Francisco, CA (US); Christopher James Kelley, Orinda, CA (US); Jessica Lee, Brooklyn, NY (US); Igor Bonaci, Canton Schwyz (CH); and Daniel Valcarce Silva, Zürich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jan. 10, 2024, as Appl. No. 18/409,268.
Claims priority of provisional application 63/616,304, filed on Dec. 29, 2023.
Int. Cl. G06T 19/00 (2011.01); G06V 20/20 (2022.01)
CPC G06T 19/006 (2013.01) [G06V 20/20 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computing system for augmented-reality annotations, the system comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising:
obtaining a user input, wherein the user input comprises a query associated with a user environment;
obtaining image data descriptive of the user environment, wherein the image data depicts at least a portion of the user environment;
processing the query and the image data with a vision language model to generate a model-generated query;
processing the model-generated query with a search engine to determine a plurality of search results;
processing the user input and at least a subset of the plurality of search results with a generative model to generate a model-generated response, wherein the generative model comprises a machine-learned autoregressive language model, wherein the model-generated response comprises a predicted response to the query, and wherein the model-generated response is associated with an object;
processing the model-generated response and the image data with an image augmentation model to generate an augmented image, wherein the augmented image is descriptive of the user environment annotated based on the model-generated response, and wherein the image augmentation model annotates the image data based on detecting the object in the image data; and
providing the augmented image for display.