US 12,244,834 B2
Utilizing hierarchical structure for neural network based tools in video coding
Zeqiang Li, Palo Alto, CA (US); Xiaozhong Xu, State College, PA (US); Wei Wang, Palo Alto, CA (US); Wei Jiang, Sunnyvale, CA (US); and Shan Liu, San Jose, CA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Sep. 17, 2021, as Appl. No. 17/478,138.
Claims priority of provisional application 63/136,055, filed on Jan. 11, 2021.
Prior Publication US 2022/0224924 A1, Jul. 14, 2022
Int. Cl. H04N 19/33 (2014.01); H04N 19/105 (2014.01); H04N 19/136 (2014.01); H04N 19/172 (2014.01); H04N 19/436 (2014.01); H04N 19/593 (2014.01)
CPC H04N 19/33 (2014.11) [H04N 19/105 (2014.11); H04N 19/136 (2014.11); H04N 19/172 (2014.11); H04N 19/436 (2014.11); H04N 19/593 (2014.11)] 20 Claims
OG exemplary drawing
 
1. A method of video coding, executable by a processor, comprising:
receiving video data including a plurality of pictures,
wherein each of the plurality of pictures is associated with a respective hierarchical temporal level ID,
wherein a hierarchical temporal ID indicates a level of a respective picture in a predefined hierarchical structure for decoding, and
wherein the predefined hierarchical structure for decoding is based on frequencies and quantization parameters of the plurality of pictures;
generating virtual reference frames corresponding to a first number of pictures of the plurality of pictures using a neural network based on each picture in the first number of pictures being at a pre-determined level in the predefined hierarchical structure,
wherein the generating the virtual reference frames for a respective picture in the first number of pictures is based on the neural network using more than one nearest decoded picture to the respective picture as input,
wherein the pre-determined level comprises one or more hierarchical temporal IDs having associated frequencies and quantization parameters higher than a threshold; and
decoding the video data based on the virtual reference frames generated using the neural network.