US 12,147,732 B2
Analyzing graphical user interfaces to facilitate automatic interaction
Joseph Lange, Zurich (CH); Asier Aguirre, Adliswil (CH); Olivier Siegenthaler, Zurich (CH); and Michal Pryt, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Aug. 16, 2023, as Appl. No. 18/234,760.
Application 18/234,760 is a continuation of application No. 17/251,468, granted, now 11,775,254, previously published as PCT/US2020/016143, filed on Jan. 31, 2020.
Prior Publication US 2023/0393810 A1, Dec. 7, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 3/16 (2006.01); G06T 7/70 (2017.01); G06V 20/00 (2022.01); G10L 15/26 (2006.01)
CPC G06F 3/167 (2013.01) [G06T 7/70 (2017.01); G06V 20/00 (2022.01); G10L 15/26 (2013.01); G06T 2200/24 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method implemented using one or more processors, comprising:
identifying a target visual cue to be located in a graphical user interface (“GUI”) comprising an interactive webpage accessible at a uniform resource locator (“URL”), wherein the interactive webpage comprises one or more interactive elements;
obtaining a document object model (DOM) of the interactive webpage, wherein the DOM of the interactive webpage comprises one or more interactive elements;
obtaining a bitmap screenshot of the GUI;
performing object recognition processing on the bitmap screenshot of the GUI to generate output indicative of a location of a detected instance of the target visual cue in the bitmap screenshot;
based on the location of the detected instance of the target visual cue, applying features of the bitmap screenshot and the DOM as inputs across a machine learning model to generate output;
based on the output, identifying one or more of the interactive elements of the GUI as corresponding to the target visual cue;
automatically populating the one or more identified interactive elements with data;
validating that submission of the data resulted in a next state of the interactive webpage; and
in response to the validating, generating, and storing in association with the URL of the interactive webpage, a script that is subsequently executable in association with the interactive webpage and a subsequent free-form natural language input to trigger subsequent automatic population of the one or more identified interactive elements with data determined from a subsequent user intent determined from the subsequent free-form natural language input.