US 11,973,964 B2
Video compression based on long range end-to-end deep learning
Franck Galpin, Thorigne-Fouillard (FR); Hien Pham, Paris (FR); and Jean Begaint, Menlo Park, CA (US)
Assigned to INTERDIGITAL MADISON PATENT HOLDINGS, SAS, Paris (FR)
Appl. No. 17/761,487
Filed by INTERDIGITAL MADISON PATENT HOLDINGS, SAS, Paris (FR)
PCT Filed Sep. 15, 2020, PCT No. PCT/US2020/050892
§ 371(c)(1), (2) Date Mar. 17, 2022,
PCT Pub. No. WO2021/055360, PCT Pub. Date Mar. 25, 2021.
Claims priority of application No. 19306147 (EP), filed on Sep. 20, 2019.
Prior Publication US 2022/0377358 A1, Nov. 24, 2022
Int. Cl. H04N 19/436 (2014.01); H04N 19/105 (2014.01); H04N 19/137 (2014.01); H04N 19/167 (2014.01); H04N 19/172 (2014.01); H04N 19/177 (2014.01); H04N 19/543 (2014.01); H04N 19/573 (2014.01); H04N 19/85 (2014.01)
CPC H04N 19/436 (2014.11) [H04N 19/105 (2014.11); H04N 19/137 (2014.11); H04N 19/167 (2014.11); H04N 19/172 (2014.11); H04N 19/177 (2014.11); H04N 19/543 (2014.11); H04N 19/573 (2014.11); H04N 19/85 (2014.11)] 28 Claims
OG exemplary drawing
 
1. A method for video encoding, comprising:
providing a region to encode and one or more reconstructed regions to a motion estimator to produce an output comprising an estimated bi-predicted motion field for the region to encode, the estimated bi-predicted motion field comprising two uni-directional motion fields;
providing the estimated bi-predicted motion field to an auto-encoder to produce an output comprising video data representative of the encoded region and reconstructed bi-predicted motion field;
providing the reconstructed bi-predicted motion field and one or more reconstructed regions to a deep neural network to produce an output comprising refined bi-predicted motion field for the region to encode by determining a motion correction that is applied to at least one of the two uni-directional motion fields of the reconstructed bi-predicted motion field to produce the refined bi-predicted motion field; and
providing the refined bi-predicted motion field and one or more reconstructed regions to a motion compensator to produce an output comprising reconstructed version of the region to encode.