US 11,964,655 B2
	Backward anti-collision driving decision-making method for heavy commercial vehicle
Xu Li, Nanjing (CN); Weiming Hu, Nanjing (CN); Jinchao Hu, Nanjing (CN); and Xuefen Zhu, Nanjing (CN)
Assigned to SOUTHEAST UNIVERSITY, Nanjing (CN)
Appl. No. 17/766,870
Filed by SOUTHEAST UNIVERSITY, Nanjing (CN)
PCT Filed Apr. 12, 2021, PCT No. PCT/CN2021/086570 § 371(c)(1), (2) Date Apr. 6, 2022, PCT Pub. No. WO2022/126940, PCT Pub. Date Jun. 23, 2022.
Claims priority of application No. 202011512719.5 (CN), filed on Dec. 20, 2020.
Prior Publication US 2023/0182725 A1, Jun. 15, 2023
Int. Cl. B60W 30/095 (2012.01); G06N 3/045 (2023.01); G06N 3/0464 (2023.01)

CPC B60W 30/0956 (2013.01) [G06N 3/045 (2023.01); G06N 3/0464 (2023.01); B60W 2300/125 (2013.01); B60W 2420/408 (2024.01); B60W 2554/4041 (2020.02); B60W 2554/80 (2020.02)]

1 Claim

1. A backward anti-collision driving decision-making method for a heavy commercial vehicle, wherein the method comprises the following steps:

step I: establishing a virtual traffic environment model: for high-class highways, establishing a virtual traffic environment model, that is, a three-lane virtual environment model comprising straight lanes and curved lanes, wherein the heavy commercial vehicle moves in the traffic environment model, a target vehicle follows the heavy commercial vehicle, and in the process there are 4 different running conditions, comprising acceleration, deceleration, uniform velocity and lane change;

in a process of establishing the virtual traffic environment model, vehicle movement state information is obtained in real time through a centimeter-level high-precision differential GPS, an inertia measurement unit and a millimeter wave radar mounted on each vehicle, comprising positions, velocity, acceleration, relative distance and relative velocity of the two vehicles; a type of the target vehicle is obtained in real time through a visual sensor mounted at a rear part of the vehicle; and drivers control information is read through a CAN bus, comprising a throttle opening and a steering wheel angle of the vehicle;

the target vehicle refers to a vehicle located behind the heavy commercial vehicle on a running road, located within the same lane line, running in the same direction and closest to the heavy commercial vehicle, including 3 types: small, medium and large vehicles;

step II: establishing a backward collision risk assessment model, specifically comprising:

firstly, calculating time required for collision between the heavy commercial vehicle and the target vehicle:

in formula (1), RTTC(t) represents backward distance collision time at time tin unit of second, x_c(t) represents vehicle distance in unit of meter, v_F(t) and v_R(t) respectively represent the velocity of the heavy commercial vehicle and the target vehicle, v_r(t) represents the relative velocity of the two vehicles in unit of meter per second, and v_r(t)=v_F(t)−v_R(t);

secondly, calculating a backward collision risk level; when the backward distance collision time is not less than 2.1 seconds and not more than 4.4 seconds, giving a backward collision alarm, indicating that a backward collision early warning system has passed a test; and based on this, quantifying the backward collision risk level:

in formula (2), δ_wrepresents a quantified value of a backward collision risk; when δ_w≥1, it indicates that there is no backward collision risk; when 0.5≤δ_w≤1, it indicates that there is a backward collision risk; and when 0≤δ_w≤0.5, it indicates that the backward collision risk level is very high;

step III: establishing a backward anti-collision driving decision-making model of the heavy commercial vehicle: comprehensively considering the influence of traffic environment, vehicle operation state, rear vehicle type and backward collision risk level on backward collision, establishing a backward anti-collision driving decision-making model of the heavy commercial vehicle by adopting a PPO algorithm, and performing interactive iterative learning with a target vehicle movement random process model to obtain an optimal backward anti-collision decision, specifically comprising the following 4 sub-steps:

sub-step 1: defining basic parameters of the backward anti-collision driving decision-making model

firstly, describing a backward anti-collision driving decision-making problem as a Markov decision-making process (S,A,P,r) under a certain reward function, wherein S is a state space, A is a backward anti-collision action decision, P is a state transition probability caused by movement uncertainty of the target vehicle, and r is a reward function; and secondly, defining basic parameters of the Markov decision-making process, specifically comprising:

(1) defining a state space

establishing a state space expression by using the vehicle movement state information output in step I and the backward collision risk level output in step II:

S_t=(v_{F_lon},a_{F_lon},v_{r_lon},a_{r_lon},θ_str,p_thr,L_r,δ_w,T_m) (3)

In formula (3), S_trepresents a state space at time t, v_{F_lon}and v_{r_lon}respectively represent the longitudinal velocity of the heavy commercial vehicle and the relative longitudinal velocity of the two vehicles in unit of meter per second, a_{F_lon}and a_{r_lon}respectively represent the longitudinal acceleration of the heavy commercial vehicle and the relative longitudinal acceleration of the two vehicles in unit of meter per square second, θ_strrepresents a steering wheel angle of the vehicle in unit of degree, p_thrrepresents a throttle opening in unit of percentage, L_rrepresents a relative vehicle distance in unit of meter, δ_wand T_mrespectively represent the backward collision risk level and the type of the target vehicle, m=1, 2, 3 respectively represent that the target vehicle is a large vehicle, a medium vehicle and a small vehicle, and T_m=m in the present invention;

(2) defining an action decision

in order to comprehensively consider the influence of transverse movement and longitudinal movement on backward collision, defining a driving policy, that is, an action decision output by the decision-making model, by using the steering wheel angle and the throttle opening as control quantities in the present invention:

A_t=[θ^str_out,p_{thr_out}] (4)

in formula (4), A_trepresents an action decision at time t, θ_{str_out}represents a normalized steering wheel angle control quantity in a range of [−1, 1], and p_{thr_out}represents a normalized throttle opening control quantity in a range of [0, 1]; and when p_{thr_out}=0, it indicates that the vehicle does not accelerate, and when δ_brake=1, it indicates that the vehicle accelerates at a maximum acceleration;

(3) establishing a reward function

in order to evaluate the advantages and disadvantages of the action decision, establishing a reward function to concretize and digitalize the evaluation; and considering that backward anti-collision driving decision-making is a multi-objective optimization problem involving safety, comfort and other objectives, designing the reward function as follows:

r_t=r₁+r₂+r₃ (5)

in formula (5), r_trepresents a reward function at time t, r₁represents a safety distance reward function, r₂represents a comfort reward function, and r₃represents a penalty function:

firstly, designing a safety distance reward function r₁:

in formula (6), L_rand L_srespectively represent relative vehicle distance and a safety distance threshold, ω_drepresents a safety distance weight coefficient, valued as ω_d=0.85 in the present invention;

secondly, designing a comfort reward function r₂:

r₂=ω_j|a_{F_lon}(t+1)−a_{F_lon}(t)| (7)

in formula (7), ω_jis a comfort weight coefficient, valued as ω_j=0.95 in the present invention;

finally, designing a penalty function r₃:

(4) designing an expected maximum policy

in formula (9), π* is an expected maximum policy, π is a backward anti-collision decision-making policy, γ is a discount factor, γϵ(0,1), and τ(π) represents trajectory distribution under policy π;

sub-step 2: designing a network architecture of the backward anti-collision driving decision-making model

setting up a backward anti-collision driving decision-making network by using an “Actor-Critic” network framework, comprising an Actor network and a Critic network, wherein the Actor network uses state space information as an input and outputs an action decision, that is, the throttle opening and steering wheel angle control quantities of the heavy commercial vehicle; the Critic network uses the state space information and the action decision as an input, and outputs a value of current “state-action”, specifically comprising:

(1) designing an Actor network

establishing a hierarchical coder structure and respectively extracting features of various information in the state space; firstly, constructing 3 serially connected convolution layers (C_F1, C_F2, C_F3) and 1 maximum pooling layer (P₁), extracting features of the movement state information (longitudinal velocity, longitudinal acceleration, steering wheel angle, and throttle opening) of the vehicle, and coding them into an intermediate feature vector h₁; extracting features of the relative movement state information (relative longitudinal velocity, relative longitudinal acceleration, and relative vehicle distance) of the two vehicles by using the same structure, that is, 3 serially connected convolution layers (C_R1, C_R2, C_R3) and 1 maximum pooling layer (P₂), and coding them into an intermediate feature vector h₂; extracting features of the collision risk level and the type of the target vehicle by using a convolution layer C_W1and a maximum pooling layer P₃, and coding them into an intermediate feature vector h₃; and secondly, combining the features h₁, h₂and h₃and connecting full connection layers FC₄and FC₅to output the action decision,

wherein the number of neurons of the convolution layers C_F1, C_F2, C_F3, C_R1, C_R2, C_R3and C_W1is set to be 20, 20, 10, 20, 20, 10 and 20 respectively; the number of neurons of the full connection layers FC₄and FC₅is set to be 200; the activation function of each convolution layer and full connection layer is a Rectified Linear Unit (ReLU), and an expression thereof is f(x)=max (0, x);

(2) designing a Critic network

establishing a Critic network by using a neural network with a multiple hidden layer structure; firstly, inputting a state space S_tinto a hidden layer FC_C1; at the same time, inputting an action decision A_tinto a hidden layer FC_C2; secondly, combining the hidden layers FC_C1and FC_C2by tensor addition; and finally, after passing through the full connection layers FC_C3and FC_C4sequentially, outputting a value of the Critic network,

wherein the number of neurons of the layers FC_C1and FC_C2is set to be 400, the number of neurons of other hidden layers is set to be 200, and the activation function of each layer is an ReLU;

sub-step 3: training the backward anti-collision driving decision-making model performing gradient updating to the network parameters by using loss functions J_actorand J_criticwherein a specific training process is as follows:

sub-step 3.1: initializing the Actor network and the Critic network;

sub-step 3.2: performing iterative solution, wherein each iteration comprises sub-step 3.21 to sub-step 3.4 specifically as follows:

sub-step 3.21: performing iterative solution, wherein each iteration comprises sub-step 3.211 to sub-step 3.213 as follows:

sub-step 3.211: obtaining a movement control operation of the vehicle by using the virtual traffic environment model in step I;

sub-step 3.212: obtaining sample data (S_t,A_t,f_t) by using the Actor network;

sub-step 3.213: ending a cycle to obtain a sample point set [(S₁,A₁,r₁), (S₂,A₂,r₂), . . . , (S_t,A_t,r_t)];

sub-step 3.22: calculating an advantage function:

in formula (10), F_trepresents an advantage function, V(S_t) represents a value function of state S_t, F_t>0 represents that the possibility of taking a current action should be increased, and F_t<0 represents that the possibility of taking the action should be decreased;

sub-step 3.23: performing iterative solution, wherein each iteration comprises sub-step 3.231 to sub-step 3.233 specifically as follows:

sub-step 3.231: calculating an objective function of the Actor network;

sub-step 3.232: updating the parameter J_actorof the Actor network:

in formula (11), p_t(θ) represents a ratio of a new policy π^θ to an old policy π_{θ_old}on action decision distribution in a policy updating process,

clip(⋅) represents a clipping function, and ε is a constant valued as ε=0.25;

sub-step 3.233: updating the parameter J_criticof the Critic network:

sub-step 3.234: ending a cycle;

sub-step 3.3: performing iterative updating according to the method provided in sub-step 3.2 to make the Actor network and the Critic network converge gradually, wherein in a training process, if the vehicle has a backward collision or rollover, a current round is terminated and a new round for training is started; and when the iteration reaches the maximum number of steps or the model is capable of making a backward anti-collision driving decision stably and accurately, the training ends;

sub-step 4: outputting the decision-making policy by using the backward anti-collision decision-making model

inputting the information obtained by the centimeter-level high-precision differential GPS, the inertia measurement unit, the millimeter wave radar and the CAN bus into the trained backward anti-collision driving decision-making model, such that proper steering wheel angle and throttle opening control quantities are capable of being quantitatively output to provide an effective and reliable backward anti-collision driving suggestion for a driver, so as to realize effective, reliable and adaptive backward anti-collision driving decision-making of the heavy commercial vehicle.