US 12,477,150 B2
Compression-informed video super-resolution
Yinxiao Li, Sunnyvale, CA (US); Peyman Milanfar, Menlo Park, CA (US); Feng Yang, Sunnyvale, CA (US); Ce Liu, Cambridge, MA (US); Ming-Hsuan Yang, Cupertino, CA (US); and Pengchong Jin, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 18/256,837
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Aug. 5, 2021, PCT No. PCT/US2021/044630
§ 371(c)(1), (2) Date Jun. 9, 2023,
PCT Pub. No. WO2022/231643, PCT Pub. Date Nov. 3, 2022.
Claims priority of provisional application 63/179,795, filed on Apr. 26, 2021.
Prior Publication US 2024/0022760 A1, Jan. 18, 2024
Int. Cl. H04N 19/59 (2014.01); G06T 3/18 (2024.01); G06V 10/74 (2022.01); H04N 19/117 (2014.01); H04N 19/503 (2014.01); H04N 19/70 (2014.01); H04N 19/80 (2014.01)
CPC H04N 19/59 (2014.11) [G06T 3/18 (2024.01); G06V 10/761 (2022.01); H04N 19/117 (2014.11); H04N 19/503 (2014.11); H04N 19/70 (2014.11); H04N 19/80 (2014.11)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method to bi-directionally train a machine-learned video super-resolution (VSR) model using compressed video data, the method comprising:
obtaining, by a computing system comprising one or more computing devices, a set of ground truth video data that comprises a plurality of ground truth higher-resolution (HR) video frames and a plurality of lower-resolution (LR) video frames, wherein the plurality of LR video frames respectively correspond to the plurality of ground truth HR video frames, and wherein the plurality of ground truth HR video frames and the plurality of LR video frames are arranged in a temporal sequence that corresponds to a compressed video;
for each of one or more positions in the temporal sequence:
performing, by the computing system, a forward temporal prediction to generate a forward-predicted HR video frame for the current position in the temporal sequence based on one or more video frames associated with one or more previous positions in the temporal sequence, wherein performing the forward temporal prediction comprises processing, by the computing system and using the machine-learned VSR model, a previous HR video frame associated with a previous position in the temporal sequence, a previous LR video frame associated with the previous position in the temporal sequence, and a current LR video frame associated with a current position in the temporal sequence to generate the forward-predicted HR video frame for the current position in the temporal sequence;
performing, by the computing system, a backward temporal prediction to generate a backward-predicted HR video frame for the current position in the temporal sequence based on one or more video frames associated with one or more subsequent positions in the temporal sequence, wherein performing the backward temporal prediction comprises processing, by the computing system using the machine-learned VSR model, a subsequent HR video frame associated with a subsequent position in the temporal sequence, a subsequent LR video frame associated with the subsequent position in the temporal sequence, and a current LR associated with a current position in the temporal sequence to generate the backward-predicted HR video frame for the current position in the temporal sequence;
evaluating, by the computing system, a loss function for the machine-learned VSR model, wherein the loss function compares the ground truth HR video frame to the forward-predicted HR video frame and compares the ground truth HR video frame to the backward-predicted HR video frame; and
modifying, by the computing system, one or more values of one or more parameters of the machine-learned VSR model based on the loss function;
wherein the previous position in the temporal sequence comprises an immediately preceding position in the temporal sequence and wherein the subsequent position in the temporal sequence comprises an immediately proceeding position in the temporal sequence.