US 12,035,380 B2
	Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning
Haibin Yu, Liaoning (CN); Xiaoyu Liu, Liaoning (CN); Chi Xu, Liaoning (CN); Peng Zeng, Liaoning (CN); Xi Jin, Liaoning (CN); and Changqing Xia, Liaoning (CN)
Assigned to SHENYANG INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES, Liaoning (CN)
Appl. No. 17/296,509
Filed by SHENYANG INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES, Liaoning (CN)
PCT Filed Dec. 25, 2020, PCT No. PCT/CN2020/139322 § 371(c)(1), (2) Date May 24, 2021, PCT Pub. No. WO2021/227508, PCT Pub. Date Nov. 18, 2021.
Claims priority of application No. 202010385640.4 (CN), filed on May 9, 2020.
Prior Publication US 2022/0217792 A1, Jul. 7, 2022
Int. Cl. H04W 74/08 (2024.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); H04L 5/00 (2006.01); H04W 74/00 (2009.01)

CPC H04W 74/0875 (2013.01) [G06N 3/045 (2023.01); H04L 5/003 (2013.01); H04W 74/002 (2013.01)]

8 Claims

1. An industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning, comprising the following steps:

1) establishing a dynamic multi-priority multi-channel access neural network model based on deep reinforcement learning;

2) collecting state, action and reward information of T time slots of all industrial 5G terminals in the industrial 5G network as training data to train the neural network model; and

3) collecting the state information of all the industrial 5G terminals in the industrial 5G network at the current time slot as the input of the neural network model; conducting multi-priority channel allocation through the neural network model; and conducting multi-access by the industrial 5G terminals according to a channel allocation result,

wherein step 1 further comprises:

constructing two neural network models with the same structure of q-eval deep neural network and q-next deep neural network, and neural network parameters params=[x_in, x_mn, x_fc, x_out, w, b],

wherein x_inrepresents the number of neurons on an input layer and is equal to the length of a state vector s_nof the industrial 5G terminal n (n∈N), N represents the number of the industrial 5G terminals, x_mnrepresents the number of neurons on a recurrent neural network layer, x_fcrepresents the number of neurons on a fully connected layer, x_outrepresents the number of neurons on an output layer and is equal to the length of an action vector a_nof the industrial 5G terminal n, w represents a weight, and b represents an offset,

wherein the q-eval deep neural network is used for obtaining a valuation function Q(s_n,a_n) of the action vector a_nof the current state vector s_nof the industrial 5G terminal n, the q-next neural network model is used for selecting the valuation function max_a′_{_n}Q(s′_n, a′_nof the maximum action vector a′_nof the next state vector s′_nof the industrial 5G terminal n;

updating the q-eval deep neural network parameters w and b using reinforcement learning Q(s_n, a_n)=Q(s_n, a_n)+α(r_n+γ max_a′_{_n}Q(s′_n, a′_n)−Q(s_n, a_n)), wherein α represents a learning rate; γ represents a discount ratio, and r_nrepresents a reward obtained by the industrial 5G terminal n in the current state s_nby executing the action vector a_n,

the initialization parameters of the q-next deep neural network being the same as the initialization parameters of the q-eval deep neural network; updating the parameters w and b of the q-eval deep neural network after each iterative training of the neural network model, and updating the parameters w and b of the q-next deep neural network once after each iterative training of the neural network model for I times.