| CPC G05D 1/637 (2024.01) [G05D 1/46 (2024.01); G05D 1/49 (2024.01)] | 5 Claims |

|
1. A method for UAV path planning in urban airspace based on safe reinforcement learning, comprising:
S1, collecting state information of a UAV, urban airspace and an urban ground environment, and defining a state of the UAV at any moment t as st, wherein st=[xt,yt,zt];
S2, constituting a safe reinforcement learning algorithm called shield-DDPG architecture by four functional modules: an environment module, a neural network module, a shield module, and a replay buffer; and conducting training by the neural network module according to the state st, the neural network module comprising a main network and a target network; the shield module being constructed by a linear temporal logic and specifically comprising a finite-state reactive system, a state trace, a safety specification, a Markov decision process, a safety automaton and an observe function, the shield module acting between a main actor network and a main critic network, the main actor network outputting an action ut;
S3, determining, by the shield module, safety of an action at=ut+ft=[atx,aty,atz], in which ft=ε·DtD is an attractive force, ε is an attractive coefficient, and DtD is a distance between a UAV current position and a destination point;
S4, verifying the safety of the action at by the shield module, and finally outputting a safe action at′;
S5, by the final safe action at′ obtained, performing at′ for state transition to obtain a next state st+1 as well as a reward Rewardt; and
S6, storing the current state st, the final safe action at′, the reward Rewardt, the next state st+1, and a training flag dt in the replay buffer, and sampling a random minibatch of transitions from the replay buffer for updating the neural network.
|