US 12,225,271 B2
	Video generation method and related apparatus
Bin Shao, Shenzhen (CN); Jun Yue, Shenzhen (CN); Li Qian, Shenzhen (CN); Songcen Xu, Shenzhen (CN); Xueyan Huang, Shenzhen (CN); and Yajiao Liu, Shenzhen (CN)
Assigned to HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Filed by HUAWEI TECHNOLOGIES CO., LTD., Guangdong (CN)
Filed on Nov. 29, 2022, as Appl. No. 18/070,689.
Application 18/070,689 is a continuation of application No. PCT/CN2021/097047, filed on May 29, 2021.
Claims priority of application No. 202010480675 (CN), filed on May 30, 2020.
Prior Publication US 2023/0089566 A1, Mar. 23, 2023
Int. Cl. H04N 21/80 (2011.01); G06T 7/00 (2017.01); G06V 10/44 (2022.01); G06V 20/70 (2022.01); G06V 40/16 (2022.01)

CPC H04N 21/80 (2013.01) [G06T 7/0002 (2013.01); G06V 10/44 (2022.01); G06V 20/70 (2022.01); G06V 40/172 (2022.01); G06T 2207/30168 (2013.01); G06T 2207/30201 (2013.01)]

18 Claims

1. A video generation method, comprising:

receiving a video generation instruction, and obtaining text information and image information in response to the video generation instruction, wherein the text information comprises one or more keywords, the image information comprises N images, and N is a positive integer greater than or equal to 1;

obtaining, based on the one or more keywords, at least one image feature, of the N images, that corresponds to the one or more keywords; and

inputting the one or more keywords and the at least one image feature of the N images into a target generator network to generate a target video, wherein the target video comprises M images, the M images are generated based on the at least one image feature and correspond to the one or more keywords, and M is a positive integer greater than 1,

wherein inputting the one or more keywords and the at least one image feature of the N images into the target generator network to generate the target video comprises:

extracting a first spatial variable that is in vector space and corresponds to each of the one or more keywords;

extracting second spot variables that are in vector space and respectively correspond to the at least one image feature of the N images; and

inputting the first spatial variable and the second spatial variables into the target generator network to generate the target video.