US 12,405,774 B2
	Creating user interface using machine learning
Zifeng Huang, Emeryville, CA (US); Yang Li, Palo Alto, CA (US); Xin Zhou, Mountain View, CA (US); Gang Li, Mountain View, CA (US); and John Francis Canny, Berkeley, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jul. 6, 2023, as Appl. No. 18/348,191.
Application 18/348,191 is a continuation of application No. 18/046,428, filed on Oct. 13, 2022, granted, now 11,740,879.
Claims priority of provisional application 63/255,366, filed on Oct. 13, 2021.
Prior Publication US 2023/0350651 A1, Nov. 2, 2023
Int. Cl. G06N 3/00 (2023.01); G06F 3/0484 (2022.01); G06F 8/33 (2018.01); G06F 8/38 (2018.01); G06F 40/40 (2020.01); G06N 3/0455 (2023.01)

CPC G06F 8/38 (2013.01) [G06F 3/0484 (2013.01); G06F 8/33 (2013.01); G06F 40/40 (2020.01); G06N 3/0455 (2023.01)]

12 Claims

1. A computer-implemented method of predicting a graphical user interface, comprising:

generating, for a natural language textual description, using a pre-trained word embedding model, an encoded representation of the natural language textual description;

providing the encoded representation of the natural language textual description to machine learned model that is configured to receive as input, the encoded representation and generate as output, a first embedding vector;

determining second embedding vectors, each second embedding vector determined from graphical attribute data describing a corresponding user interface image in a training dataset;

selecting one of the corresponding user interface images based on the first embedding vector and the second embedding vectors; and

providing the selected corresponding user interface images as output in response to the natural language text description;

wherein the machine learned model comprises a first encoder and a second encoder trained on a training dataset comprising a plurality of training samples, and wherein:

each training sample including:

a user interface image that includes a plurality of graphical elements; and

a natural language textual description of the user interface image;

the first encoder receives, as input, the encoded representation of the natural language textual description and generates, as output, the first embedding vector; and

the second encoder receives, as input, the graphical attribute data and generates, as output, the second embedding vector.