| CPC G06T 11/001 (2013.01) [G06F 40/40 (2020.01); G06T 7/11 (2017.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 20/70 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 20 Claims |

|
1. A method, comprising:
receiving, by a processing device, a text prompt and an input image by a generator network, the generator network including a plurality of layers configured to perform respective edits for the text prompt at different resolutions;
generating, by the processing device, a plurality of masks defining local edit regions, respectively, of the input image for respective layers of the plurality of layers, the plurality of masks based on the text prompt;
generating, by the processing device using the generator network, an edited image by editing the input image based on the plurality of masks, the respective edits of the respective layers, and the text prompt; and
outputting, by the processing device, the edited image.
|