US 12,462,348 B2
	Multimodal diffusion models
Cusuh Ham, Marietta, GA (US); Tobias Hinz, Campbell, CA (US); Jingwan Lu, Sunnyvale, CA (US); Krishna Kumar Singh, San Jose, CA (US); and Zhifei Zhang, San Jose, CA (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Feb. 6, 2023, as Appl. No. 18/165,141.
Prior Publication US 2024/0265505 A1, Aug. 8, 2024
Int. Cl. G06T 5/70 (2024.01)

CPC G06T 5/70 (2024.01) [G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

20 Claims

1. A method comprising:

obtaining a noise input and guidance information indicating a target image control;

generating, using an image generation model, a noise prediction based on the noise input, wherein the noise prediction comprises a prediction of noise to remove from the noise input;

generating noise modulation parameters based on the guidance information using a conditioning network, wherein the noise modulation parameters represent the target image control;

combining the noise prediction and the noise modulation parameters to obtain a modified noise prediction, wherein the modified noise prediction comprises a modified prediction of noise to remove from the noise input based on the target image control; and

generating, using the image generation model, a synthetic image based on the modified noise prediction, wherein the synthetic image depicts a scene based on the target image control.