US 12,464,174 B1
	Objective video quality assessment models based on bitstream, and additional pixel domain features
Yongjun Wu, Bellevue, WA (US); Hai Wei, Seattle, WA (US); Sriram Sethuraman, Bangalore (IN); Yixu Chen, Seattle, WA (US); and Zaixi Shang, Austin, TX (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 14, 2022, as Appl. No. 18/066,202.
Int. Cl. H04N 21/234 (2011.01); G06V 10/70 (2022.01); H04N 19/124 (2014.01); H04N 19/139 (2014.01); H04N 21/2187 (2011.01)

CPC H04N 21/23418 (2013.01) [G06V 10/70 (2022.01); H04N 19/124 (2014.11); H04N 19/139 (2014.11); H04N 21/2187 (2013.01)]

20 Claims

1. A system, comprising:

one or more processors; and

one or more memories storing computer-executable instructions that, when executed by the one or more processors, configure the one or more processors for:

receiving video content comprising video frames;

determining metadata features comprising a quantization parameter (QP) or Constant Rate Factor (CRF), a codec, a bit depth, an average bitrate, a frame rate, and a resolution of the video content;

determining a first video quality score of the video content using a first trained random forest model by providing the metadata features as inputs to the first trained random forest model;

determining a second video quality score using a second machine learning model and a third video quality score using a third machine learning model, wherein the second machine learning model is trained on video content having spatial information above a threshold value and the third machine learning model is trained on video content having spatial information below the threshold value;

determining average QP, spatial and temporal motion weighted QP, frame average motion magnitude, motion direction, motion randomness, encoding block statistics, frame size, local frequency-coefficients-weighted, and variance-weighted encoding error of the video frames during an encoding or a decoding of the video content, wherein the temporal motion weighted QP is based on a normalized average motion vector length for each frame of video content;

determining a residual prediction of the first video quality score based on the first video quality score, the second video quality score, the third video quality score, average QP, spatial motion weighted QP, frame average motion magnitude, motion direction, motion randomness, encoding block statistics, frame size, local frequency-coefficients-weighted, and variance-weighted encoding error of the video frames using a second trained random forest model; and

determining a second video quality score based on the first video quality score and the residual prediction.