| CPC H04N 19/42 (2014.11) [H04N 19/172 (2014.11); H04N 19/91 (2014.11)] | 20 Claims |

|
1. A system comprising:
a first component to extract temporal features from a current frame being coded and a previous frame of a video, wherein three-dimensional based joint features are determined using the temporal features and spatial features from the current frame;
a second component that uses a first transformer to receive the three-dimensional based joint features as input and fuse the spatial features from the current frame with the temporal features to generate spatio-temporal features as first output;
a third component that uses a second transformer to perform entropy coding using the first output and at least a portion of the temporal features to generate a second output, wherein the second transformer is used to fuse the spatio-temporal features with the at least a portion of the temporal features to output fused spatio-temporal features that are entropy encoded to generate the second output; and
a fourth component that uses a third transformer to reconstruct the current frame, wherein the first output is processed using the second output to generate third output, and wherein the third transformer fuses the temporal features with the third output.
|