US 12,443,779 B2
	Soft measurement method for dioxin emission of grate furnace MSWI process based on simplified deep forest regression of residual fitting mechanism
Jian Tang, Beijing (CN); Heng Xia, Beijing (CN); Canlin Cui, Beijing (CN); and Junfei Qiao, Beijing (CN)
Appl. No. 18/727,294
Filed by Beijing University of Technology, Beijing (CN)
PCT Filed Apr. 26, 2023, PCT No. PCT/CN2023/090771 § 371(c)(1), (2) Date Jul. 8, 2024, PCT Pub. No. WO2023/165635, PCT Pub. Date Sep. 7, 2023.
Claims priority of application No. 202210218420.1 (CN), filed on Mar. 4, 2022.
Prior Publication US 2024/0419872 A1, Dec. 19, 2024
Int. Cl. G06F 30/27 (2020.01); G06F 111/10 (2020.01)

CPC G06F 30/27 (2020.01) [G06F 2111/10 (2020.01)]

1 Claim

1. A soft measurement method for dioxin emission of grate furnace MSWI process based on simplified deep forest regression of residual fitting mechanism, comprising:

a feature selection module based on Mutual information (MI) and significance test (ST) and a simplified deep forest regression (SDFR) module based on the residual fitting mechanism; wherein the feature selection module selects corresponding features by calculating MI value and ST value of each feature; for the SDFR module, Layer-k represents a k-th layer model, ŷ₁^Regvocrepresents an output vector of a first layer model, v₁^Augfearepresents augmented regression vector of a second layer input, y_k^Regvocrepresents an average value of ŷ_k^Regvoc, α is a remaining learning rate between each layer; x and respectively represents process data before and after feature selection; y, ŷ and e are a true value, predicted value and prediction error respectively;

in addition, {δ_MI, δ_SL, θ, T, α, K} represents a learning parameter set of proposed SDFR-ref, where: δ_MIrepresents a threshold of MI, δ_SLrepresents a threshold of significance level, and θ represents a minimum sample in a leaf node number, T represents a number of decision trees in each layer of the model, a is the learning rate in a gradient boosting process, and K represents the number of layers; a globally optimized selection of these learning parameters being capable of improving synergy between different modules, thereby improving an overall performance of the model; wherein a proposed modeling strategy is formulated as solving the following optimization problem:

wherein, F^SDFR-ref(⋅) represents the SDFR-ref model; f_FeaSel(⋅) represents a nonlinear feature selection algorithm proposed; N represents a number of modeling samples; y_nrepresents an n-th true value; c_1,l^CARTrepresents predicted value of l-th leaf node of first CART, C_T,l^CART, represents predicted value of the I-th leaf node of T-th CART; D={X,y|X∈R^N×M,y∈R^N×1} represents an original modeling data and an input of the feature selection algorithm, M is a number of original features; I_R_^M×N(X^Xsel) is an indicator function, wherein when X^Xsel∈R^M×N, then I_R_^M×N(X^Xsel)=1, when X^Xsel∉R^M×N, then I_R_^M×N(X^Xsel)=0;

Feature selection based on MI and ST;

wherein MI and ST are used to calculate an information correlation between the original features and dioxins (DXN) values, and achieve a best selection of features through preset thresholds;

wherein for an input data set, the nonlinear feature selection algorithm f_FeaSel(⋅) proposed is defined as follows:

wherein, D^Sel={X^Sel,y|X∈R^N×M^{^Sel},y∈R^N×1} respectively represent an output of a proposed feature selection algorithm, and M^Selis a number of selected features;

wherein MI provides an information quantification measure of a degree of statistical dependence between random variables, and estimates a degree of interdependence between two random variables to express shared information, with a calculation process as follows:

wherein, x_iis an i-th eigenvector of x, x_n,iis a n-th value of an i-th vector, y represents a joint probability density; p(x_n,i) and p(y_n) represent a marginal probability density of x_n,iand y_n;

wherein when a MI value of a feature is greater than the threshold δ_MI, the MI value is assigned as an important feature constituting a preliminary feature set X^MI; ST is used to analyze a correlation between the selected features based on MI and remove collinear features;

a Pearson coefficient value PCoe between the selected features x_i^MIand x_j^MIis calculated as follows:

wherein, x_i^MIand x_j^MIrepresent an average value of x_i^MIand x_i^MIrespectively, x_n,i^MIand x_n,j^MIrepresent a n-th value of x_i^MIand x_j^MI; Z-test is used to calculate the z_testvalue between features x_i^MIand x_j^MI:

wherein, S_iand S_jrepresent a standard deviation of x_i^MIand x_j^MI; N_iand N_jrepresent a number of samples of x_i^MIand x_j^MI;

wherein, a p-value is obtained by looking up a z_testvalue in a table; wherein in H₀it is presumed that there is no linear relationship between an i-th and j-th features, and the Pearson coefficient PCoe is regarded as an alternative hypothesis H₁; based on the comparison of p-value and significance level δ_SL, a final selected X^Xselincluding preferred features is determined; wherein criteria are expressed as follows:

wherein based on the above assumptions, collinear features selected by MI are removed, thereby reducing the impact of data noise on a training model;

wherein a training set after feature selection is recorded as D^Sel; an SDFR algorithm replaces a forest algorithm in the original DFR with a decision tree, that is, CART; each layer contains multiple decision trees, and tree nodes are divided using a squared error minimization criterion; a minimum loss function of this process is expressed as follows:

wherein, c_Left^CARTand c_Right^CARTare the outputs of R_Leftand R_Rightnodes respectively; y_Leftand y_Rightrepresent a true values in R_Leftand R_Rightnodes respectively;

specifically, the nodes are determined in the following way:

wherein, j and S represent segmentation features and segmentation values respectively; x_j^Selis a j-th eigenvalue of the selected feature x^Sel; therefore, CART can be expressed as:

wherein, L represents a number of CART leaf nodes, c_l^CARTrepresents an output of the l-th leaf node of CART, and I_R_{_l}_^CART(x^Sel) is the indicator function, when x^Sel∈R_l^CART, I_R_{_l}_^CART(x^Sel)=1, when x^Sel∈R_l^CART,I_R_{_l}_^CART(x^Sel)=0;

a first-level model containing multiple CARTs is represented as follows:

wherein, f₁^SDFR(⋅) represents the first layer model in SDFR, T represents a number of CARTs in each layer model, h_1,t^CART(⋅) represents a t-th CART model in layer 1;

wherein, a first-layer regression vector ŷ₁^Regvecfrom a first-layer model f₁^SDFR(⋅) is expressed as follows:

wherein, c_1,lCART represents the predicted value of the I-th leaf node of the first CART, C_T,l^CARTrepresents the predicted value of the l-th leaf node of the T-th CART;

the augmented regression vector v₁^Augfeais obtained by merging a layer regression vectors ŷ₁^Regvecand is expressed as follows:

wherein, f_FeaCom¹(⋅) represents an eigenvector combination function;

v₁^Augfeais then used as a feature input for a next layer; a DXN true value is no longer used in subsequent cascade modules, but a new true value is recalculated through a gradient boosting strategy; Therefore, the following formula is used to calculate a loss function of the squared error:

wherein, L₁^SDFR(⋅) represents the squared error loss function in SDFR-ref; y_n⁽¹⁾represents an n-th true value of a first layer training set;

the loss function L₁^SDFRis further used to calculate a gradient direction as shown below;

wherein, σ_1,n^SDFRis the gradient of the n-th true value of layer 1; f₀^SDFR(⋅) represents an arithmetic mean of an initial true value, that is

y_nrepresents the n-th true value;

wherein, an objective function is:

wherein, f₁^SDFR(⋅) is the first layer model; α represents the learning rate; I_R(x^Sel) represents when x^Sel∈R, I_R(x^Sel)=1, when x^Sel∉R, I_R(x^Sel)=0;

therefore, a true value of a second level is:

wherein, y₁is a true value of the first layer model, that is, y₁=y, y is a true value vector of DXN; y₁^Regvecrepresents a mean value of the first layer regression vector;

the training set of a k-th layer based on an augmented regression vector of a (k−1)-th layer is expressed as D_k^Sel={{v_(k-1),n^Augfea}_n=1^N,y_k}, v_(k-1)^Augfeais the augmented regression vector of the (k−1)-th layer, and y_kis a k-th true value;

first, establish a k-th level decision tree h_k^CART(⋅) according to formulas (7) and (8); A k-th level model is expressed as follows:

wherein, f_k^SDFR(⋅) represents the k-th layer model, and h_k,t^CART(⋅) represents a k-th layer of the t-th CART model;

then, the augmented regression vector v_k^Augfeaof the k-th layer is expressed as follows:

wherein, ŷ_k^Regvecrepresents the regression vector of the k-th layer, that is, ŷ_k^Regvec=[h_k,1^CART(⋅), . . . , h_k,T^CART(⋅)];

then, calculate the gradient σ_k^SDFRaccording to formulas (12) and (13); A true value of (k+1)-th layer is expressed as follows:

the K-th layer is a last layer of an SDFR-ref training process, that is, the preset maximum number of layers, and its training set is D_K^Sel={{v_(K-1),n^Augfea}_n=1^N,y_K};

first, build a decision tree model h_K^CART(⋅) through the training set D_K^Seland further obtain the K-th layer model f_K^SDFR(⋅); Then, calculate the K-th layer regression vector ŷ_K^Regvecaccording to an input augmented regression vector v_(K-1)^Augfea, which is expressed as follows:

wherein, h_K,1^CART(⋅) represents the first CART model of the K-th layer, h_K,T^CART(⋅) represents a T-th CART model of the K-th layer;

finally, output value after gradient boosting with learning rate α is:

wherein, y_k^Regvecrepresents a mean value of the k-th layer regression vector;

after multiple layers are superimposed, each layer is used to reduce a residual of previous layer; finally, an SDFR-ref model can be expressed as:

wherein, I_R(x^Sel) means I_R(x^Sel)=1 when x^Sel∈R, and I_R(x^Sel)=0 when x_Sel∉R;

wherein F^SDFR-ref(⋅) is calculated based on addition, a final predicted value is not simply averaged; and first calculate a mean value of the regression vector of each layer as follows, taking layer 1 as an example:

add K predicted values to get the final predicted value, as shown below:

and

wherein, ŷ is a predicted value of SDFR-ref model; means I_R(x^Sel)=1 when x^Sel∈R, and I_R(x^Sel)=0 when x^Sel∉R.