US 12,033,620 B1
Systems and methods for analyzing text extracted from images and performing appropriate transformations on the extracted text
Harshit Kharbanda, Pleasanton, CA (US); Jessica Lee, Brooklyn, NY (US); Christopher James Kelley, Orinda, CA (US); Fabian Roth, Zürich (CH); Dounia Berrada, Saratoga, CA (US); Samer Hassan Hassan, Saratoga, CA (US); Afroz Mohiuddin, Campbell, CA (US); Mikhail Khalman, San Francisco, CA (US); Ali Essam Ali Elqursh, San Jose, CA (US); and Belinda Luna Zeng, Cupertino, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 8, 2023, as Appl. No. 18/463,951.
Int. Cl. G06F 3/0483 (2013.01); G06F 16/30 (2019.01); G06F 16/33 (2019.01); G06F 16/583 (2019.01); G06V 10/778 (2022.01); G06V 30/14 (2022.01); G06V 30/148 (2022.01); G10L 15/183 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/183 (2013.01) [G06F 16/5846 (2019.01); G06V 10/778 (2022.01); G06V 30/1456 (2022.01); G06V 30/153 (2022.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] 18 Claims
OG exemplary drawing
 
13. A computer-implemented method for responding to queries about an image, the method comprising:
obtaining, by a computing system with one or more processors, an image, wherein the image depicts a first set of textual content;
determining, by the computing system, one or more characteristics of the first set of textual content, wherein the one or more characteristics of the first set of textual content includes a density of the first set of textual content;
determining, by the computing system, a response type from a plurality of response types based on the one or more characteristics, wherein the plurality of response types includes a summarization response, an explanation response, and a query response, wherein the determined response type is a summarization response, and wherein determining a response type from a plurality of response types based on the one or more characteristics further comprise:
determining the density for the first set of textual content within the image;
responsive to a determination that the density for the first set of textual content within the image satisfies a threshold, determining that the response type is a summarization response type; and
updating a user interface to include a summarize user interface element;
generating, by the computing system, a model input, wherein the model input comprises data descriptive of the first set of textual content and a prompt associated with the response type;
providing, by the computing system, the model input as an input to a machine-learned language model;
receiving, by the computing system, a second set of text as an output of the machine-learned language model as a result of the machine-learned language model processing the model input; and
providing, by the computing system, the second set of text for display to a user, wherein the second set of textual content is associated with the response type.