US 12,278,969 B2
	Codec rate distortion compensating downsampler
Christopher Richard Schroers, Uster (CH); Roberto Gerson de Albuquerque Azevedo, Zurich (CH); Nicholas David Gregory, Zurich (CH); Yuanyi Xue, Alameda, CA (US); Scott Labrozzi, Cary, NC (US); and Abdelaziz Djelouah, Zurich (CH)
Assigned to Disney Enterprises, Inc., Burbank, CA (US); and ETH Zurich (Eidgenossische Technische Hochschule Zurich), Zurich (CH)
Filed by Disney Enterprises, Inc., Burbank, CA (US); and ETH Zurich (EIDGENÖSSISCHE TECHNISCHE HOCHSCHULE ZURICH), Zürich (CH)
Filed on Aug. 4, 2023, as Appl. No. 18/230,409.
Application 18/230,409 is a continuation of application No. 17/500,373, filed on Oct. 13, 2021, granted, now 11,765,360.
Prior Publication US 2023/0379475 A1, Nov. 23, 2023
Int. Cl. H04N 19/147 (2014.01); G06N 3/08 (2023.01); G06T 3/4046 (2024.01); G06T 9/00 (2006.01); H04N 19/132 (2014.01); H04N 19/184 (2014.01)

CPC H04N 19/147 (2014.11) [G06N 3/08 (2013.01); G06T 3/4046 (2013.01); G06T 9/002 (2013.01); H04N 19/132 (2014.11); H04N 19/184 (2014.11)]

18 Claims

1. A video processing system comprising:

an upsampler;

a video codec;

a trained machine learning (ML) model-based video downsampler trained using a neural network-based (NN-based) proxy video codec; and

a processing hardware configured to:

receive an input video sequence having a first display resolution;

extract a content sample of the input video sequence;

map, using the trained ML model-based video downsampler, the content sample to a lower resolution sample;

transform, using one of the video codec or the NN-based proxy video codec, the lower resolution sample into a decoded sample bitstream;

predict, using the upsampler and the decoded sample bitstream, an output sample corresponding to the content sample; and

modify, based on the predicted output sample, one or more parameters of the trained ML model-based video downsampler;

wherein the ML model-based video downsampler is trained using the input video sequence, the output sample, and an objective function based on an estimated rate of the lower resolution sample and a plurality of perceptual loss functions.