US 12,277,671 B2
	Multi-stage attention model for texture synthesis
Shouchang Guo, Ann Arbor, MI (US); Arthur Jules Martin Roullier, Paris (FR); Tamy Boubekeur, Paris (FR); Valentin Deschaintre, London (GB); Jerome Derel, Hauts-de-Seine (FR); and Paul Parneix, Paris (FR)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Nov. 10, 2021, as Appl. No. 17/454,434.
Prior Publication US 2023/0144637 A1, May 11, 2023
Int. Cl. G06T 3/4046 (2024.01); G06T 5/77 (2024.01); G06T 7/11 (2017.01); G06T 7/40 (2017.01)

CPC G06T 3/4046 (2013.01) [G06T 5/77 (2024.01); G06T 7/11 (2017.01); G06T 7/40 (2013.01)]

20 Claims

1. A method for image in-painting using a computing device including at least one processor and at least one memory, comprising:

receiving, using the at least one memory, an image comprising a first region depicting a texture and a second region to be in-painted with the texture;

segmenting, using the at least one processor, input features of the image to obtain a first input sequence of feature patches, wherein each of the first input sequence of feature patches corresponds to a region of pixels in the image, and wherein a first patch of the first input sequence of feature patches corresponds to the first region depicting the texture and a second patch of the first input sequence of feature patches corresponds to the second region to be in-painted with the texture;

transforming, using the at least one processor, the first input sequence of feature patches using a first attention network to obtain a first output sequence of feature patches, wherein the first attention network constructs each of the first output sequence of feature patches based on a nonlinear mapping from the first input sequence of patches determined by an attention map between the first input sequence of patches and the first output sequence of feature patches;

computing, using the at least one processor, a second input sequence of feature patches by subdividing patches of the first output sequence of feature patches using a first convolution operation on the first output sequence of feature patches, wherein the second input sequence comprises more patches than the first output sequence of feature patches and smaller scale patches than the first output sequence of feature patches, and wherein individual patches of the second input sequence of feature patches correspond to fewer pixels of the image than the patches of the first output sequence of feature patches;

transforming, using the at least one processor, the second input sequence of feature patches using a second attention network to obtain a second output sequence of feature patches;

computing, using the at least one processor, a third input sequence of feature patches by combining patches of the second output sequence of feature patches using a second convolution operation on the second output sequence of feature patches, wherein the third input sequence of feature patches comprises fewer patches than the second output sequence of feature patches and same scale patches as the first output sequence of feature patches, and wherein individual patches of the third input sequence of feature patches correspond to more pixels of the image than the patches of the second output sequence of feature patches;

transforming, using the at least one processor, the third input sequence of feature patches using a third attention network to obtain a third output sequence of feature patches, wherein a first patch of the third input sequence of feature patches corresponds to the first region depicting the texture and a second patch of the third input sequence of feature patches corresponds to the second region and is in-painted with the texture; and

generating, using the at least one processor, a modified image by combining the third output sequence of feature patches, wherein the modified image comprises the texture within the region to be in-painted.