| CPC G11B 27/031 (2013.01) [G06F 40/40 (2020.01)] | 30 Claims |

|
1. A method for adapting a text-to-image (T2I) diffusion model to edit video content based on a text prompt, the method comprising:
a. receiving, by at least one processor, a video comprising a plurality of frames and a text prompt specifying modifications to visual elements in the video;
b. performing spectral decomposition, by the at least one processor, on at least one weight matrix of the pre-trained T2I diffusion model to separate each matrix into a set of singular values and corresponding singular vectors;
c. generating, by the at least one processor, a spectral shift parameter matrix by selectively adjusting only the singular values based on the text prompt, while maintaining the singular vectors unmodified;
d. applying, by the at least one processor, a spectral shift regularizer to the spectral shift parameter matrix, wherein the spectral shift regularizer imposes more restricted adjustments to singular values with larger magnitudes and allows comparatively relaxed adjustments to singular values with smaller magnitudes;
e. adapting, by the at least one processor, the pre-trained T2I diffusion model by incorporating the spectral shift parameter matrix, thereby creating an adapted model configured to modify specific visual elements within the video according to the text prompt; and
f. outputting, by the at least one processor, an edited video in which the visual elements specified by the text prompt are modified in the plurality of frames, while preserving non-targeted visual elements and maintaining temporal coherence across frames.
|