| CPC G06V 10/764 (2022.01) [G06F 18/2148 (2023.01); G06F 18/217 (2023.01); G06N 20/00 (2019.01); G06V 10/7747 (2022.01); G06V 40/10 (2022.01); G06V 40/171 (2022.01); H04N 5/265 (2013.01)] | 17 Claims |

|
1. A model training method, the method comprising:
obtaining an image sample set and brief-prompt information, the image sample set comprising at least one image sample, the brief-prompt information representing key-point information of a to-be-trained object in the at least one image sample, wherein the at least one image sample includes a plurality of consecutive image samples, and the plurality of consecutive image samples are used for forming a video sample;
generating a content mask set according to the image sample set and the brief-prompt information, the content mask set comprising at least one content mask, the at least one content mask being obtained by extending outward a region identified according to the brief-prompt information in the at least one image sample;
generating a to-be-trained image set according to the content mask set, the to-be-trained image set comprising at least one to-be-trained image, the at least one to-be-trained image being in correspondence to the at least one image sample;
obtaining, based on the image sample set and the to-be-trained image set, a predicted image set through a to-be-trained information synthesis model, the predicted image set comprising at least one predicted image, the at least one predicted image being in correspondence to the at least one image sample; and
training, based on the predicted image set and the image sample set, the to-be-trained information synthesis model by using a target loss function, to obtain an information synthesis model, comprising:
determining a first loss function according to N frames of predicted images in the predicted image set, N frames of to-be-trained images in the to-be-trained image set, and N frames of image samples in the image sample set, N being an integer greater than 1, wherein the first loss function is determined based on an output of a generator of the to-be-trained information synthesis model when inputting a superposition of (N-1) frames of to-be-trained images and an Nth frame of to-be-trained image to the generator;
determining a second loss function according to N frames of predicted images in the predicted image set and N frames of image samples in the image sample set;
determining the target loss function according to the first loss function and the second loss function;
iteratively updating a model parameter of the to-be-trained information synthesis model according to the target loss function; and
generating, in a case that an iteration end condition is satisfied, the information synthesis model according to the model parameter of the to-be-trained information synthesis model.
|