US 12,412,319 B2
	Method and apparatus for generating video, electronic device, and computer program product
Jiawei Wang, Beijing (CN); Yuchen Zhang, Beijing (CN); Jiaxin Zou, Beijing (CN); Yan Zeng, Beijing (CN); Guoqiang Wei, Beijing (CN); Liping Yuan, Beijing (CN); and Hang Li, Beijing (CN)
Assigned to Beijing Youzhuju Network Technology Co., Ltd., Beijing (CN)
Filed by Beijing Youzhuju Network Technology Co., Ltd., Beijing (CN)
Filed on Jul. 16, 2024, as Appl. No. 18/774,561.
Claims priority of application No. 202410132635.0 (CN), filed on Jan. 30, 2024.
Prior Publication US 2025/0245867 A1, Jul. 31, 2025
Int. Cl. G06T 11/00 (2006.01); G06T 7/11 (2017.01)

CPC G06T 11/00 (2013.01) [G06T 7/11 (2017.01); G06T 2211/441 (2023.08)]

20 Claims

1. A method for generating a video, comprising:

obtaining, by a device, a visual token for generating an image frame in the video;

obtaining, by the device, a control token for constraining position information of an object in the image frame, the control token being generated, by the device, based on a bounding box or a motion trajectory and indicating motion control information for the object, wherein the bounding box or the motion trajectory is provided by a user for constraining a position of the object in the image frame to be generated; and

generating, by a video generation model with a motion control module deployed on the device, the image frame in the video based on the visual token and the control token, wherein the object in the image frame satisfies the position information,

wherein the visual token is a first visual token, and generating the image frame in the video based on the visual token and the control token comprises:

generating, by the motion control module, a second visual token based on the first visual token and the control token, the second visual token comprising motion control information provided by the bounding box or the motion trajectory; and

generating the image frame in the video based on the second visual token.