| CPC G06F 3/167 (2013.01) [G06T 7/70 (2017.01); G06V 20/00 (2022.01); G10L 15/26 (2013.01); G06T 2200/24 (2013.01)] | 17 Claims |

|
1. A method implemented using one or more processors, comprising:
identifying a target visual cue to be located in a graphical user interface (“GUI”) comprising an interactive webpage accessible at a uniform resource locator (“URL”), wherein the interactive webpage comprises one or more interactive elements;
obtaining a document object model (DOM) of the interactive webpage, wherein the DOM of the interactive webpage comprises one or more interactive elements;
obtaining a bitmap screenshot of the GUI;
performing object recognition processing on the bitmap screenshot of the GUI to generate output indicative of a location of a detected instance of the target visual cue in the bitmap screenshot;
based on the location of the detected instance of the target visual cue, applying features of the bitmap screenshot and the DOM as inputs across a machine learning model to generate output;
based on the output, identifying one or more of the interactive elements of the GUI as corresponding to the target visual cue;
automatically populating the one or more identified interactive elements with data;
validating that submission of the data resulted in a next state of the interactive webpage; and
in response to the validating, generating, and storing in association with the URL of the interactive webpage, a script that is subsequently executable in association with the interactive webpage and a subsequent free-form natural language input to trigger subsequent automatic population of the one or more identified interactive elements with data determined from a subsequent user intent determined from the subsequent free-form natural language input.
|