US 12,348,743 B2
	Method and system for learned video compression
Sam Tak Wu Kwong, Kowloon (HK); Haifeng Guo, Kowloon (HK); Shiqi Wang, Kowloon (HK); and Dongjie Ye, Kowloon (HK)
Assigned to City University of Hong Kong, Kowloon (HK)
Filed by City University of Hong Kong, Kowloon (HK)
Filed on Jun. 21, 2023, as Appl. No. 18/338,731.
Prior Publication US 2024/0430463 A1, Dec. 26, 2024
Int. Cl. H04N 19/42 (2014.01); H04N 19/12 (2014.01); H04N 19/139 (2014.01); H04N 19/172 (2014.01); H04N 19/176 (2014.01); H04N 19/60 (2014.01); H04N 19/85 (2014.01)

CPC H04N 19/42 (2014.11) [H04N 19/12 (2014.11); H04N 19/139 (2014.11); H04N 19/172 (2014.11); H04N 19/176 (2014.11); H04N 19/60 (2014.11); H04N 19/85 (2014.11)]

18 Claims

1. A computer-implemented method for learned video compression, comprising:

processing a current frame (x_t) and previously decoded frame (x_t-1) of a video data using a motion estimation model to estimate a motion vector (v_t) for every pixel;

compressing the motion vectors (v_t) and reconstructing the motion vectors (v_t) to reconstructed motion vectors (v_t);

applying an enhanced context mining (ECM) model to obtain enhanced context (C_E) from the reconstructed motion vectors (v_t) and previously decoded frame feature (x̆_t-1); wherein applying the ECM comprises:

obtaining the motion vectors (v_t) and decoded frame feature (x̆_t-1) based on the current input frame (x_t) and previously decoded frame (x_t-1);

warping the motion vectors (v_t) and decoded frame feature (x̆_t-1) to obtain a warped feature (x_t); and

processing the warped feature (x_t) using a resblock and convolution layer to obtain a context ( custom character

_t);

compressing the current frame (x_t) with the assistance of the enhanced context (C_E) to obtain a reconstructed frame (x′_t); and

providing the reconstructed frame (x′_t) to a post-enhancement backend network to obtain a high-resolution frame (x_t).