CPC F42B 15/01 (2013.01) [G06N 3/084 (2013.01); G06N 3/092 (2023.01)] | 7 Claims |
1. A method for designing a terminal guidance law based on deep reinforcement learning, comprising the following steps:
establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile;
abstracting a solving problem of the kinematics equation and modeling as a Markov decision process;
building an algorithm network, setting algorithm parameters, and training the algorithm network based on a randomly initialized data set to determine weight parameters of an initial network;
continuously caching, by an agent, state transition data and reward values as learning samples in an experience pool based on a Q-Learning algorithm, and continuously selecting a fixed number of samples from the experience pool to train the network until set learning rounds are reached; and
generating, during a specific guidance process, an action in real time based on a current state by using a learned network to transfer to a next state, and continuously repeating the process until the target is hit to complete the guidance process.
|