US 12,206,873 B2
Video encoding and decoding method, apparatus and computer device
Wenfeng Cao, Shanghai (CN)
Assigned to Black Sesame Technologies Inc., San Jose, CA (US)
Filed by Black Sesame Technologies Inc., San Jose, CA (US)
Filed on Apr. 25, 2022, as Appl. No. 17/728,347.
Claims priority of application No. 202110615456.9 (CN), filed on Jun. 2, 2021.
Prior Publication US 2022/0394283 A1, Dec. 8, 2022
Int. Cl. H04N 19/119 (2014.01); H04N 19/146 (2014.01); H04N 19/176 (2014.01); H04N 19/33 (2014.01)
CPC H04N 19/33 (2014.11) [H04N 19/119 (2014.11); H04N 19/146 (2014.11); H04N 19/176 (2014.11)] 7 Claims
OG exemplary drawing
 
1. A video coding method, the method comprising:
for each non-key frame in a video frame sequence, dividing a current non key frame into a plurality of sub image blocks according to information of an object in the current non key image, and determining an importance level of each sub image block, the video frame sequence comprising a plurality of non-key frames acquired at a predetermined frame rate;
according to a pre-stored first correlation between different importance levels and different resolutions, performing conversion to make each sub image block in each non-key frame have a resolution corresponding to the importance level of the sub image block, wherein in the first correlation, a higher importance level corresponds to a higher resolution, and the highest importance level corresponds to a target highest resolution;
performing video encoding on the video frame sequence to obtain encoded video data
performing video decoding on video data to obtain a decoded video frame sequence, wherein in each non-key frame in the decoded video frame sequence, an object area having a higher importance level has a higher resolution;
reconstructing each non-key frame into a non-key frame having the target highest resolution, to obtain a reconstructed video frame sequence having the target highest resolution
wherein the decoded video frame sequence further comprises a plurality of key frames acquired at a variable frame rate:
reconstructing each non-key frame into a non-key frame having the target highest resolution, to obtain a reconstructed video frame sequence having the target highest resolution comprises:
for each non-key frame, acquiring a time instant of collection of the current non-key frame and a position of a carrier device collects at the time instant of collection of the current non-key image:
determining, from the plurality of key frames, a first key frame at the time closest to the time instant of collection and a second key frame at the position closest to the position at the time instant of collection;
on the basis of the first key frame and the second key frame, and a pre-trained super-resolution model, determining interpolation data of an area, having a resolution lower than a target highest resolution, of the current non-key frame, so as to reconstruct the current non-key image into a non-key image having the target highest resolution. wherein the super-resolution model data is obtained by training with a predetermined number of frames collected from objects having importance levels lower than that of the target highest resolution; and
combining the non-key frames reconstructed for each non-key frame and having the target highest resolution, to obtain a reconstructed video frame sequence having the target highest resolution.