CPC G05D 1/0206 (2013.01) [G05D 1/0088 (2013.01); G06N 3/045 (2023.01); G06N 7/01 (2023.01)] | 6 Claims |
1. An unmanned surface vessel (USV) formation path-following method based on deep reinforcement learning, comprising the following steps:
step S1, constructing a decision-making neural network model;
step S2, according to simultaneous exploration of multiple underactuated USVs, extracting environmental information of water space and storing data to an experience pool;
wherein the step S2 comprises:
outputting actions through the decision-making neural network model, storing relevant states, actions and rewards in the experience pool, and then selecting the experience records from the experience pool to form a training data set, and when the training data set is used to train the decision-making neural network model, starting detection by each USV from different initial states, and storing detected motion states in the experience pool;
step S3, selecting training data from the data stored in the experience pool to train the decision-making neural network model, and optimizing a training process of deep reinforcement learning by adopting a random braking mechanism, so as to obtain USV formation pattern keeping capacity;
step S4, controlling the USVs to move along a preset path by the decision-making neural network model based on collaborative exploration, and continuously updating positions of the USVs, to realize path following of all the USVs in a formation;
wherein the step S4 comprises:
assuming that a center point of the USV formation is pc(xc,yc), where xc,yc respectively represent an abscissa and an ordinate of the center point of the USV formation, ed represents a cross track error between pc and a point pp(xp,yp) on the preset path, xp,yp represent the abscissa and the ordinate of pp(xp,yp), pv(xv,yv) represents an expected position of a virtual guide, xv,yv represent the abscissa and the ordinate of the virtual guide, USV formation path following requires the virtual guide of the USV formation to move along a parameterized preset path (xp(θ),yp(θ)), xp(θ),yp(θ) represent points on the preset path, and θ represents a path parameter;
determining the cross track error ed of path following by
and
in a USV formation path following process, changing the position of the center point pc of the USV formation constantly, changing a projection pp of pc on the preset path as well, and selecting the virtual guide pv on the preset path according to ed so as to maintain the distance from the point pp; and
step S5, controlling the movement of the USV pattern according to path following, so as to keep the USV following pattern and realize formation path following.
|