| CPC H04N 19/147 (2014.11) [H04N 19/177 (2014.11)] | 13 Claims |

|
1. A method of performing video coding for machine (VCM) image enhancement, the method being executed by at least one processor and comprising:
obtaining a coded image from a coded bitstream;
obtaining enhancement parameters corresponding to the coded image;
decoding the coded image using a VCM decoding module to generate a decoded image;
generating an enhanced image using an enhancement module based on the decoded image and the enhancement parameters, wherein the enhancement parameters are optimized for one of a human vision VCM task, a machine vision VCM task, an d a human-machine hybrid vision VCM task, the enhancement module comprises a neural network, the enhancement parameters comprise neural network parameters corresponding to the neural network, the enhanced image is generated using rate-distortion optimization, the neural network parameters are selected based on a distortion metric and a parameter size, the distortion metric comprises at least one from among a mean square error, a structure similarity metric, and a multi-scale structure similarity metric associated with the enhanced image and the input image; and
providing at least one of the decoded image and the enhanced image to at least one of a human vision module and a machine vision module for performing the one of the human vision VCM task, the machine vision VCM task, and the human-machine hybrid vision VCM task,
wherein the enhancement parameters are based on features of an input image, features of the enhanced image, a number of channels of a feature map, a number of rows of the feature map, a number of columns of the feature map, a channel index, a row, and a column position, and
the mean square error is calculated using a following equation:
![]() where MSE represents the mean square error, f (c, h, w) represents the features of the input image, f(c, h, w) represents the features of the enhanced image, C represents the number of channels of the feature map, H represents the number of the rows of the feature map, W represents the number of the columns of the feature map, c represents the channel index, h represents the row, and w represents the column position.
|