| CPC H04N 19/59 (2014.11) [G06T 3/18 (2024.01); G06V 10/761 (2022.01); H04N 19/117 (2014.11); H04N 19/503 (2014.11); H04N 19/70 (2014.11); H04N 19/80 (2014.11)] | 14 Claims |

|
1. A computer-implemented method to bi-directionally train a machine-learned video super-resolution (VSR) model using compressed video data, the method comprising:
obtaining, by a computing system comprising one or more computing devices, a set of ground truth video data that comprises a plurality of ground truth higher-resolution (HR) video frames and a plurality of lower-resolution (LR) video frames, wherein the plurality of LR video frames respectively correspond to the plurality of ground truth HR video frames, and wherein the plurality of ground truth HR video frames and the plurality of LR video frames are arranged in a temporal sequence that corresponds to a compressed video;
for each of one or more positions in the temporal sequence:
performing, by the computing system, a forward temporal prediction to generate a forward-predicted HR video frame for the current position in the temporal sequence based on one or more video frames associated with one or more previous positions in the temporal sequence, wherein performing the forward temporal prediction comprises processing, by the computing system and using the machine-learned VSR model, a previous HR video frame associated with a previous position in the temporal sequence, a previous LR video frame associated with the previous position in the temporal sequence, and a current LR video frame associated with a current position in the temporal sequence to generate the forward-predicted HR video frame for the current position in the temporal sequence;
performing, by the computing system, a backward temporal prediction to generate a backward-predicted HR video frame for the current position in the temporal sequence based on one or more video frames associated with one or more subsequent positions in the temporal sequence, wherein performing the backward temporal prediction comprises processing, by the computing system using the machine-learned VSR model, a subsequent HR video frame associated with a subsequent position in the temporal sequence, a subsequent LR video frame associated with the subsequent position in the temporal sequence, and a current LR associated with a current position in the temporal sequence to generate the backward-predicted HR video frame for the current position in the temporal sequence;
evaluating, by the computing system, a loss function for the machine-learned VSR model, wherein the loss function compares the ground truth HR video frame to the forward-predicted HR video frame and compares the ground truth HR video frame to the backward-predicted HR video frame; and
modifying, by the computing system, one or more values of one or more parameters of the machine-learned VSR model based on the loss function;
wherein the previous position in the temporal sequence comprises an immediately preceding position in the temporal sequence and wherein the subsequent position in the temporal sequence comprises an immediately proceeding position in the temporal sequence.
|