| CPC G06T 11/00 (2013.01) [G06T 7/11 (2017.01); G06T 2211/441 (2023.08)] | 20 Claims |

|
1. A method for generating a video, comprising:
obtaining, by a device, a visual token for generating an image frame in the video;
obtaining, by the device, a control token for constraining position information of an object in the image frame, the control token being generated, by the device, based on a bounding box or a motion trajectory and indicating motion control information for the object, wherein the bounding box or the motion trajectory is provided by a user for constraining a position of the object in the image frame to be generated; and
generating, by a video generation model with a motion control module deployed on the device, the image frame in the video based on the visual token and the control token, wherein the object in the image frame satisfies the position information,
wherein the visual token is a first visual token, and generating the image frame in the video based on the visual token and the control token comprises:
generating, by the motion control module, a second visual token based on the first visual token and the control token, the second visual token comprising motion control information provided by the bounding box or the motion trajectory; and
generating the image frame in the video based on the second visual token.
|