| CPC G06F 30/27 (2020.01) [G06F 2111/10 (2020.01)] | 1 Claim |

|
1. A soft measurement method for dioxin emission of grate furnace MSWI process based on simplified deep forest regression of residual fitting mechanism, comprising:
a feature selection module based on Mutual information (MI) and significance test (ST) and a simplified deep forest regression (SDFR) module based on the residual fitting mechanism; wherein the feature selection module selects corresponding features by calculating MI value and ST value of each feature; for the SDFR module, Layer-k represents a k-th layer model, ŷ1Regvoc represents an output vector of a first layer model, v1Augfea represents augmented regression vector of a second layer input, ykRegvoc represents an average value of ŷkRegvoc, α is a remaining learning rate between each layer; x and respectively represents process data before and after feature selection; y, ŷ and e are a true value, predicted value and prediction error respectively;
in addition, {δMI, δSL, θ, T, α, K} represents a learning parameter set of proposed SDFR-ref, where: δMI represents a threshold of MI, δSL represents a threshold of significance level, and θ represents a minimum sample in a leaf node number, T represents a number of decision trees in each layer of the model, a is the learning rate in a gradient boosting process, and K represents the number of layers; a globally optimized selection of these learning parameters being capable of improving synergy between different modules, thereby improving an overall performance of the model; wherein a proposed modeling strategy is formulated as solving the following optimization problem:
![]() wherein, FSDFR-ref(⋅) represents the SDFR-ref model; fFeaSel(⋅) represents a nonlinear feature selection algorithm proposed; N represents a number of modeling samples; yn represents an n-th true value; c1,lCART represents predicted value of l-th leaf node of first CART, CT,lCART, represents predicted value of the I-th leaf node of T-th CART; D={X,y|X∈RN×M,y∈RN×1} represents an original modeling data and an input of the feature selection algorithm, M is a number of original features; IRM×N(XXsel) is an indicator function, wherein when XXsel∈RM×N, then IRM×N(XXsel)=1, when XXsel∉RM×N, then IRM×N(XXsel)=0;
Feature selection based on MI and ST;
wherein MI and ST are used to calculate an information correlation between the original features and dioxins (DXN) values, and achieve a best selection of features through preset thresholds;
wherein for an input data set, the nonlinear feature selection algorithm fFeaSel(⋅) proposed is defined as follows:
![]() wherein, DSel={XSel,y|X∈RN×MSel,y∈RN×1} respectively represent an output of a proposed feature selection algorithm, and MSel is a number of selected features;
wherein MI provides an information quantification measure of a degree of statistical dependence between random variables, and estimates a degree of interdependence between two random variables to express shared information, with a calculation process as follows:
![]() wherein, xi is an i-th eigenvector of x, xn,i is a n-th value of an i-th vector, y represents a joint probability density; p(xn,i) and p(yn) represent a marginal probability density of xn,i and yn;
wherein when a MI value of a feature is greater than the threshold δMI, the MI value is assigned as an important feature constituting a preliminary feature set XMI; ST is used to analyze a correlation between the selected features based on MI and remove collinear features;
a Pearson coefficient value PCoe between the selected features xiMI and xjMI is calculated as follows:
![]() wherein, xiMI and xjMI represent an average value of xiMI and xiMI respectively, xn,iMI and xn,jMI represent a n-th value of xiMI and xjMI; Z-test is used to calculate the ztest value between features xiMI and xjMI:
![]() wherein, Si and Sj represent a standard deviation of xiMI and xjMI; Ni and Nj represent a number of samples of xiMI and xjMI;
wherein, a p-value is obtained by looking up a ztest value in a table; wherein in H0 it is presumed that there is no linear relationship between an i-th and j-th features, and the Pearson coefficient PCoe is regarded as an alternative hypothesis H1; based on the comparison of p-value and significance level δSL, a final selected XXsel including preferred features is determined; wherein criteria are expressed as follows:
![]() wherein based on the above assumptions, collinear features selected by MI are removed, thereby reducing the impact of data noise on a training model;
wherein a training set after feature selection is recorded as DSel; an SDFR algorithm replaces a forest algorithm in the original DFR with a decision tree, that is, CART; each layer contains multiple decision trees, and tree nodes are divided using a squared error minimization criterion; a minimum loss function of this process is expressed as follows:
![]() wherein, cLeftCART and cRightCART are the outputs of RLeft and RRight nodes respectively; yLeft and yRight represent a true values in RLeft and RRight nodes respectively;
specifically, the nodes are determined in the following way:
![]() wherein, j and S represent segmentation features and segmentation values respectively; xjSel is a j-th eigenvalue of the selected feature xSel; therefore, CART can be expressed as:
![]() wherein, L represents a number of CART leaf nodes, clCART represents an output of the l-th leaf node of CART, and IRlCART(xSel) is the indicator function, when xSel∈RlCART, IRlCART(xSel)=1, when xSel∈RlCART,IRlCART(xSel)=0;
a first-level model containing multiple CARTs is represented as follows:
![]() wherein, f1SDFR(⋅) represents the first layer model in SDFR, T represents a number of CARTs in each layer model, h1,tCART(⋅) represents a t-th CART model in layer 1;
wherein, a first-layer regression vector ŷ1Regvec from a first-layer model f1SDFR(⋅) is expressed as follows:
![]() wherein, c1,l CART represents the predicted value of the I-th leaf node of the first CART, CT,lCART represents the predicted value of the l-th leaf node of the T-th CART;
the augmented regression vector v1Augfea is obtained by merging a layer regression vectors ŷ1Regvec and is expressed as follows:
![]() wherein, fFeaCom1(⋅) represents an eigenvector combination function;
v1Augfea is then used as a feature input for a next layer; a DXN true value is no longer used in subsequent cascade modules, but a new true value is recalculated through a gradient boosting strategy; Therefore, the following formula is used to calculate a loss function of the squared error:
![]() wherein, L1SDFR(⋅) represents the squared error loss function in SDFR-ref; yn(1) represents an n-th true value of a first layer training set;
the loss function L1SDFR is further used to calculate a gradient direction as shown below;
![]() wherein, σ1,nSDFR is the gradient of the n-th true value of layer 1; f0SDFR(⋅) represents an arithmetic mean of an initial true value, that is
![]() yn represents the n-th true value;
wherein, an objective function is:
![]() wherein, f1SDFR(⋅) is the first layer model; α represents the learning rate; IR(xSel) represents when xSel∈R, IR(xSel)=1, when xSel∉R, IR(xSel)=0;
therefore, a true value of a second level is:
![]() wherein, y1 is a true value of the first layer model, that is, y1=y, y is a true value vector of DXN; y1Regvec represents a mean value of the first layer regression vector;
the training set of a k-th layer based on an augmented regression vector of a (k−1)-th layer is expressed as DkSel={{v(k-1),nAugfea}n=1N,yk}, v(k-1)Augfea is the augmented regression vector of the (k−1)-th layer, and yk is a k-th true value;
first, establish a k-th level decision tree hkCART(⋅) according to formulas (7) and (8); A k-th level model is expressed as follows:
![]() wherein, fkSDFR(⋅) represents the k-th layer model, and hk,tCART(⋅) represents a k-th layer of the t-th CART model;
then, the augmented regression vector vkAugfea of the k-th layer is expressed as follows:
![]() wherein, ŷkRegvec represents the regression vector of the k-th layer, that is, ŷkRegvec=[hk,1CART(⋅), . . . , hk,TCART(⋅)];
then, calculate the gradient σkSDFR according to formulas (12) and (13); A true value of (k+1)-th layer is expressed as follows:
![]() the K-th layer is a last layer of an SDFR-ref training process, that is, the preset maximum number of layers, and its training set is DKSel={{v(K-1),nAugfea}n=1N,yK};
first, build a decision tree model hKCART(⋅) through the training set DKSel and further obtain the K-th layer model fKSDFR(⋅); Then, calculate the K-th layer regression vector ŷKRegvec according to an input augmented regression vector v(K-1)Augfea, which is expressed as follows:
![]() wherein, hK,1CART(⋅) represents the first CART model of the K-th layer, hK,TCART(⋅) represents a T-th CART model of the K-th layer;
finally, output value after gradient boosting with learning rate α is:
![]() wherein, ykRegvec represents a mean value of the k-th layer regression vector;
after multiple layers are superimposed, each layer is used to reduce a residual of previous layer; finally, an SDFR-ref model can be expressed as:
![]() wherein, IR(xSel) means IR(xSel)=1 when xSel∈R, and IR(xSel)=0 when xSel∉R;
wherein FSDFR-ref(⋅) is calculated based on addition, a final predicted value is not simply averaged; and first calculate a mean value of the regression vector of each layer as follows, taking layer 1 as an example:
![]() add K predicted values to get the final predicted value, as shown below:
![]() and
wherein, ŷ is a predicted value of SDFR-ref model; means IR(xSel)=1 when xSel∈R, and IR(xSel)=0 when xSel∉R.
|