| CPC G06F 9/453 (2018.02) [G06F 40/20 (2020.01); G06N 3/045 (2023.01); G06T 11/60 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 20 Claims |

|
1. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause a computing device to:
generate a natural language embedding representing a visual modification request from an input natural language text that describes the visual modification request for a digital image;
generate an attention matrix based on correlations between a visual feature map of the digital image and the natural language embedding, the attention matrix comprising an indication of a degree of editing at various locations of the digital image;
modify the visual feature map of the digital image utilizing an expanded natural language embedding that comprises reweighted elements based on the attention matrix to generate a modified visual feature map; and
generate, utilizing a generative adversarial neural network and based on the modified visual feature map, a modified digital image that comprises visual modifications from the visual modification request that vary across one or more spatial locations of the digital image.
|
|
7. A computer-implemented method comprising:
generating, utilizing a generative adversarial neural network, a modified digital image reflecting modifications to a digital image indicated in a visual modification request from an input natural language embedding;
generating an additional natural language embedding that represents visual changes between the modified digital image and the digital image by utilizing an editing description network that outputs natural language embeddings that represent visual changes between digital images;
learning parameters of the editing description network from a comparison between the input natural language embedding and the additional natural language embedding; and
learning parameters of the generative adversarial neural network from multiple variations of the digital image and multiple natural language embeddings generated for the multiple variations of the digital image utilizing the editing description network.
|
|
15. A system comprising:
one or more memory devices comprising:
a digital image;
an input natural language text that describes a visual modification request for the digital image; and
a generative adversarial neural network; and
one or more processors configured to cause the system to:
generate a visual feature map for the digital image;
generate a modified visual feature map by modifying the visual feature map of the digital image utilizing an expanded natural language embedding from the input natural language text that comprises reweighted elements based on an attention matrix, the attention matrix comprising an indication of a degree of editing at various locations of the digital image; and
generate, utilizing the generative adversarial neural network and based on the modified visual feature map, a modified digital image comprising visual modifications from the visual modification request for the digital image.
|