| CPC G06T 5/60 (2024.01) [G06T 5/92 (2024.01); G06T 5/94 (2024.01); G06T 2207/10024 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 4 Claims | 

| 
               1. A method for enhancement of a low-light image based on reinforcement learning and aesthetic evaluation, comprising: 
              S1, generating images of non-normal luminance under different lighting scenes, and constructing a training dataset for a reinforcement learning system based on the images of non-normal luminance; 
                S2, initializing the training dataset, a policy network, and a value network in the reinforcement learning system; 
                S3, updating, based on a no-reference reward score and an aesthetic assessment reward score, the policy network and the value network; 
                S4, completing model training when all samples are trained and all training iterations are completed; 
                S5, outputting an image result after the enhancement of the low-light image; 
                wherein the initializing a policy network and a value network in the operation S2 includes: 
                inputting a current state s(t) into the policy network and the value network, wherein s(t) denotes a state at a time t; an output of the policy network is a policy π(a(t)|s(t)) for taking an action a(t); and an output of the value network is a value network output value V(s(t)), representing an expected total reward from the current state s(t); 
                the updating the policy network and the value network in S3 includes: 
                S3.1, training the training dataset based on historical phase images to obtain an environmental reward value, denoted as R(t), using the following equation: 
              ![]() wherein, γi denotes an ith power of a discount factor γ and r(t) represents an immediate environmental reward value at the time t; wherein 
                the following influence factors are taken into account for obtaining the environmental reward value: 
              a spatial consistency loss, denoted as Lspa: 
                ![]() wherein K represents a size of a local region; Ω(i) represents four neighboring regions centered on a region i; Y represents an average grayscale value of pixels in a local region of an enhanced image; and I represents an average grayscale value of pixels in a local region of an input image; 
                  an exposure control loss, denoted as Lexp: 
                ![]() wherein E represents a grayscale level of an image pixel in a RGB color space; M represents a plurality of non-overlapping local regions; Y represents the average grayscale value of the pixels in the local region of the enhanced image, and the size of the local region is {K: K∈[1, M]}; 
                  a color constancy loss, denotes as Lcol: 
                ![]() wherein Jp represents an average grayscale value of pixels in a channel p of the enhanced image, Jq represents an average grayscale value of pixels in a channel q of the enhanced image; (p, q) represents any pair of channels selected from (R,G), (R,B), (G,B), and ε represents a set of (R,G), (R,B), (G,B); 
                  a luminance smoothness loss, denotes as Ltv: 
                ![]() wherein Enc represents a parametric curve mapping in each state; N represents a count of iterations for image enhancement in the reinforcement learning; ∇x represents a horizontal gradient operation, ∇y represents a vertical gradient operation; ξ denotes a set of R, G, and B channels in the enhanced image; and 
                  an aesthetic quality loss, denoted as Leva; 
                  in order to score the aesthetic quality of the enhanced image, two additional image aesthetic scoring deep learning network models, denoted as a Model1 and a Model2, are introduced to calculate the aesthetic quality loss; a color and luminance attribute of the enhanced image and a quality attribute of the enhanced image are used to train the Model1 and the Model2, respectively; and the aesthetic quality loss is scored using an additionally introduced aesthetic evaluation model including the following equation: 
                ![]() wherein f1 denotes a score of the color and luminance attribute of the enhanced image, which is a score output by the Model1 when the enhanced image is input to the Model; f2 denotes a score of the quality attribute of the enhanced image, which is a score output by the Model2 when the enhanced image is input to the Model2, and the higher the score is, the better the quality of the enhanced image is; α and β are weight coefficients; 
                  a goal of image enhancement is to make the immediate environmental reward value r(t) as large as possible; the smaller the spatial consistency loss, the exposure control loss, the color constancy loss, and the luminance smoothness loss are, the better the quality of the enhanced image is; the larger the aesthetic quality loss is, the better the quality of the enhanced image is; thus, the immediate environmental reward value r(t) at the time t is represented as follows: 
                ![]() the environmental reward value at the time t, taking into account the influence factors, is expressed as follows: 
                ![]() S3.2, training the training dataset based on the historical phase images to obtain the value network output value; 
                S3.3, updating the value network using the following equation based on the environmental reward value and the value network output value: 
              ![]() wherein θv represents a value network parameter; 
                S3.4, updating the policy network based on the environmental reward value and a predicted value using the following equations: 
              ![]() wherein θp represents a parameter of the policy network, the output of the policy network is the policy π(a(t)|s(t)) for taking the action a(t)∈A, π(a(t)|s(t)) is a probability calculated by a softmax function; A represents an action space; an output dimension of the policy network is |A|. 
               |