US 12,230,024 B2
	Systems and techniques for retraining models for video quality assessment and for transcoding using the retrained models
Yilin Wang, Sunnyvale, CA (US); Hossein Talebi, San Jose, CA (US); Peyman Milanfar, Menlo Park, CA (US); Feng Yang, Sunnyvale, CA (US); and Balineedu Adsumilli, Sunnyvale, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/762,289
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Nov. 26, 2019, PCT No. PCT/US2019/063191 § 371(c)(1), (2) Date Mar. 21, 2022, PCT Pub. No. WO2021/107922, PCT Pub. Date Jun. 3, 2021.
Prior Publication US 2022/0415039 A1, Dec. 29, 2022
Int. Cl. G06V 10/98 (2022.01); G06N 3/045 (2023.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01)

CPC G06V 10/993 (2022.01) [G06N 3/045 (2023.01); G06V 10/82 (2022.01); G06V 20/46 (2022.01)]

20 Claims

1. A method for using transfer learning to retrain and deploy a machine learning model for video quality assessment, the method comprising:

retraining a machine learning model to produce a second machine learning model for technical content assessment using a first retraining data set, wherein the machine learning model is initially trained for image object detection;

retraining the second machine learning model to produce a third machine learning model for video quality assessment using a second retraining data set different from the first retraining data set, wherein the second retraining data set corresponds to first user generated video content, wherein each sample of the second retraining data set includes a pair of video frames including a first video frame at a first quality level and a second video frame at a second quality level; and

transcoding an input video stream of second user generated video content using a transcoding pipeline including a first stage that uses the third machine learning model to select parameters based on a quality level determined for the input video stream, a second stage that transcodes the input video stream into a mezzanine format, and a third stage transcodes the input video stream from the mezzanine format into transcoded content according to the selected parameters, wherein the first stage completes processing of the input video stream before the second stage completes processing of the input video stream.

9. A method for inference processing of an input video stream using a machine learning model retrained for video quality assessment, the method comprising:

retraining, using a first retraining data set, a first machine learning model initially trained for image object detection to produce a second machine learning model trained for technical content assessment;

retraining, using a second retraining data set, the second machine learning model to produce the machine learning model;

receiving the input video stream, wherein the input video stream includes video frames at an unspecified quality level; and

transcoding the input video stream using a transcoding pipeline including a first stage, a second stage, and a third stage,

wherein the first stage selects a set of one or more adaptive compression parameters to use for transcoding the input video stream based on a quality level determined for the video frames of the input video stream using the machine learning model, wherein the set of the one or more adaptive compression parameters is one of a plurality of sets of adaptive compression parameters, and wherein each of the sets of adaptive compression parameters corresponds to a different quality level to use for transcoding,

wherein the second stage transcodes the input video stream to a mezzanine format; and

wherein the third stage transcodes the input video stream in the mezzanine format according to the selected set of the one or more adaptive compression parameters, and

wherein the first stage completes processing of the input video stream before the second stage completes processing of the input video stream.

13. A system for using transfer learning to retrain and deploy a machine learning model for video quality assessment, the system comprising:

one or more processors configured to:

retrain a machine learning model to produce a second machine learning model for technical content assessment using a first retraining data set, wherein the machine learning model is initially trained for image object detection;

retrain the second machine learning model to produce a third machine learning model for video quality assessment using a second retraining data set different from the first retraining data set, wherein the second retraining data set corresponds to first user generated video content, wherein each sample of the second retraining data set includes a pair of video frames including a first video frame at a first quality level and a second video frame at a second quality level; and

transcoding an input video stream of second user generated video content using a transcoding pipeline including a video analysis stage that uses the third machine learning model to select parameters based on a quality level determined for the input video stream, a mezzanine stage that transcodes the input video stream to a mezzanine format, and a transcoding stage that transcodes the input video stream in the mezzanine format into transcoded content according to the selected parameters, wherein the video analysis stage completes processing of the input video stream before the mezzanine stage completes processing of the input video stream.