US 12,305,967 B1
Method for designing terminal guidance law based on deep reinforcement learning
Wenjun Yi, Nanjing (CN); Jian Huang, Nanjing (CN); Tianhong Xiong, Nanjing (CN); Guilin Jiang, Nanjing (CN); Lijun Ma, Nanjing (CN); and Shu Yang, Nanjing (CN)
Assigned to Nanjing University of Science and Technology, Nanjing (CN)
Filed by Nanjing University of Science and Technology, Nanjing (CN)
Filed on Jan. 30, 2024, as Appl. No. 18/426,961.
Int. Cl. F42B 15/01 (2006.01); G06N 3/084 (2023.01); G06N 3/092 (2023.01)
CPC F42B 15/01 (2013.01) [G06N 3/084 (2013.01); G06N 3/092 (2023.01)] 7 Claims
OG exemplary drawing
 
1. A method for designing a terminal guidance law based on deep reinforcement learning, comprising the following steps:
establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile;
abstracting a solving problem of the kinematics equation and modeling as a Markov decision process;
building an algorithm network, setting algorithm parameters, and training the algorithm network based on a randomly initialized data set to determine weight parameters of an initial network;
continuously caching, by an agent, state transition data and reward values as learning samples in an experience pool based on a Q-Learning algorithm, and continuously selecting a fixed number of samples from the experience pool to train the network until set learning rounds are reached; and
generating, during a specific guidance process, an action in real time based on a current state by using a learned network to transfer to a next state, and continuously repeating the process until the target is hit to complete the guidance process.