US 12,266,178 B2
	Video cover determining method and device, and storage medium
Mingxi Wang, Beijing (CN); Rongxin Gao, Beijing (CN); and Ning Li, Beijing (CN)
Assigned to BEIJING XIAOMI MOBILE SOFTWARE CO., LTD., Beijing (CN)
Filed by BEIJING XIAOMI MOBILE SOFTWARE CO., LTD., Beijing (CN)
Filed on May 31, 2021, as Appl. No. 17/334,971.
Claims priority of application No. 202011230221.X (CN), filed on Nov. 6, 2020.
Prior Publication US 2022/0147741 A1, May 12, 2022
Int. Cl. G06V 20/40 (2022.01); G06F 16/738 (2019.01); G06F 18/22 (2023.01)

CPC G06V 20/46 (2022.01) [G06F 16/739 (2019.01); G06F 18/22 (2023.01)]

12 Claims

1. A method for determining a video cover image, comprising:

obtaining a candidate image set containing a plurality of image frames to be processed by determining the plurality of image frames to be processed from a video to be processed, each of the plurality of image frames to be processed containing at least one target object;

inputting each of the plurality of image frames to be processed into an image scoring network to obtain a black edge size of each of the plurality of image frames to be processed, a brightness score of each of the plurality of image frames to be processed, and a definition score of each of the plurality of image frames to be processed, wherein the image scoring network is obtained based on neural network training;

weighting the black edge size of each of the plurality of image frames to be processed, the brightness score of each of the plurality of image frames to be processed, and the definition score of each of the plurality of image frames to be processed;

summing the weighted black edge size of a picture where each of the plurality of image frames to be processed is located, the weighted brightness score of each of the plurality of image frames to be processed, and the weighted definition score of each of the plurality of image frames to be processed to obtain an image feature score of each of the plurality of image frames to be processed, wherein image features of an image frame to be processed comprise: a black edge, a brightness, and a definition, and the black edge is a black part except picture content in the image frame to be processed;

inputting each of the plurality of image frames to be processed into an object scoring network to obtain a number of person images in each of the plurality of image frames to be processed, a location of a person image in the image frame to be processed, a size of the person image, a definition score of the person image, an eye state score of a person in the person image, an expression score of the person in the person image, and a pose score of the person in the person image, wherein the object scoring network is obtained based on neural network training;

obtaining an object feature score of each of the plurality of image frames to be processed based on the number of the person images in each of the plurality of image frames to be processed, the location of the person image in the image frame to be processed, the size of the person image, the definition score of the person image, the eye state score of the person in the person image, the expression score of the person in the person image, and the pose score of the person in the person image;

inputting each of the plurality of image frames to be processed into an aesthetic scoring network to obtain a composition score of each of the image frames to be processed and a color richness score of each of the plurality of image frames to be processed, wherein the aesthetic scoring network is obtained based on neural network training;

obtaining an aesthetic feature score of each of the plurality of image frames to be processed based on the composition score of each of the plurality of image frames to be processed and the color richness score of each of the plurality of image frames to be processed;

obtaining the target score of each of the plurality of image frames to be processed based on the image feature score, the object feature score, and the aesthetic feature score; and

sorting a plurality of target scores of the plurality of image frames to be processed according to a set order to obtain a sorting result, and determining the video cover image of the video to be processed from the plurality of image frames to be processed according to the sorting result.