US 12,355,978 B2
	Enhancement process for video coding for machines
Wen Gao, West Windsor, NJ (US); Xiaozhong Xu, State College, PA (US); and Shan Liu, San Jose, CA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Dec. 6, 2022, as Appl. No. 18/076,020.
Claims priority of provisional application 63/313,616, filed on Feb. 24, 2022.
Prior Publication US 2023/0269378 A1, Aug. 24, 2023
Int. Cl. H04N 19/147 (2014.01); H04N 19/177 (2014.01)

CPC H04N 19/147 (2014.11) [H04N 19/177 (2014.11)]

13 Claims

1. A method of performing video coding for machine (VCM) image enhancement, the method being executed by at least one processor and comprising:

obtaining a coded image from a coded bitstream;

obtaining enhancement parameters corresponding to the coded image;

decoding the coded image using a VCM decoding module to generate a decoded image;

generating an enhanced image using an enhancement module based on the decoded image and the enhancement parameters, wherein the enhancement parameters are optimized for one of a human vision VCM task, a machine vision VCM task, an d a human-machine hybrid vision VCM task, the enhancement module comprises a neural network, the enhancement parameters comprise neural network parameters corresponding to the neural network, the enhanced image is generated using rate-distortion optimization, the neural network parameters are selected based on a distortion metric and a parameter size, the distortion metric comprises at least one from among a mean square error, a structure similarity metric, and a multi-scale structure similarity metric associated with the enhanced image and the input image; and

providing at least one of the decoded image and the enhanced image to at least one of a human vision module and a machine vision module for performing the one of the human vision VCM task, the machine vision VCM task, and the human-machine hybrid vision VCM task,

wherein the enhancement parameters are based on features of an input image, features of the enhanced image, a number of channels of a feature map, a number of rows of the feature map, a number of columns of the feature map, a channel index, a row, and a column position, and

the mean square error is calculated using a following equation:

where MSE represents the mean square error, f (c, h, w) represents the features of the input image, f(c, h, w) represents the features of the enhanced image, C represents the number of channels of the feature map, H represents the number of the rows of the feature map, W represents the number of the columns of the feature map, c represents the channel index, h represents the row, and w represents the column position.