US 11,914,376 B2
	USV formation path-following method based on deep reinforcement learning
Yong Ma, Hubei (CN); Yujiao Zhao, Hubei (CN); and Hao Li, Hubei (CN)
Assigned to WUHAN UNIVERSITY OF TECHNOLOGY, Hubei (CN)
Filed by WUHAN UNIVERSITY OF TECHNOLOGY, Hubei (CN)
Filed on Jul. 1, 2021, as Appl. No. 17/364,884.
Claims priority of application No. 202010626769.X (CN), filed on Jul. 1, 2020.
Prior Publication US 2022/0004191 A1, Jan. 6, 2022
Int. Cl. G05D 1/02 (2020.01); G05D 1/00 (2006.01); G06N 3/045 (2023.01); G06N 7/01 (2023.01)

CPC G05D 1/0206 (2013.01) [G05D 1/0088 (2013.01); G06N 3/045 (2023.01); G06N 7/01 (2023.01)]

6 Claims

1. An unmanned surface vessel (USV) formation path-following method based on deep reinforcement learning, comprising the following steps:

step S1, constructing a decision-making neural network model;

step S2, according to simultaneous exploration of multiple underactuated USVs, extracting environmental information of water space and storing data to an experience pool;

wherein the step S2 comprises:

outputting actions through the decision-making neural network model, storing relevant states, actions and rewards in the experience pool, and then selecting the experience records from the experience pool to form a training data set, and when the training data set is used to train the decision-making neural network model, starting detection by each USV from different initial states, and storing detected motion states in the experience pool;

step S3, selecting training data from the data stored in the experience pool to train the decision-making neural network model, and optimizing a training process of deep reinforcement learning by adopting a random braking mechanism, so as to obtain USV formation pattern keeping capacity;

step S4, controlling the USVs to move along a preset path by the decision-making neural network model based on collaborative exploration, and continuously updating positions of the USVs, to realize path following of all the USVs in a formation;

wherein the step S4 comprises:

assuming that a center point of the USV formation is p_c(x_c,y_c), where x_c,y_crespectively represent an abscissa and an ordinate of the center point of the USV formation, e^drepresents a cross track error between p_cand a point p_p(x_p,y_p) on the preset path, x_p,y_prepresent the abscissa and the ordinate of p_p(x_p,y_p), p_v(x_v,y_v) represents an expected position of a virtual guide, x_v,y_vrepresent the abscissa and the ordinate of the virtual guide, USV formation path following requires the virtual guide of the USV formation to move along a parameterized preset path (x_p(θ),y_p(θ)), x_p(θ),y_p(θ) represent points on the preset path, and θ represents a path parameter;

determining the cross track error e_dof path following by

and

in a USV formation path following process, changing the position of the center point p_cof the USV formation constantly, changing a projection p_pof p_con the preset path as well, and selecting the virtual guide p_von the preset path according to e_dso as to maintain the distance from the point p_p; and

step S5, controlling the movement of the USV pattern according to path following, so as to keep the USV following pattern and realize formation path following.