US 12,217,188 B2
	Method and device for user grouping and resource allocation in NOMA-MEC system based
Shasha Zhao, Nanjing (CN); Lidan Qin, Nanjing (CN); Dengyin Zhang, Nanjing (CN); Chenhui Sun, Nanjing (CN); Qing Wen, Nanjing (CN); Ruijie Chen, Nanjing (CN); and Yufan Liu, Nanjing (CN)
Filed by NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, Nanjing (CN)
Filed on Apr. 16, 2024, as Appl. No. 18/637,313.
Application 18/637,313 is a continuation of application No. PCT/CN2022/127173, filed on Oct. 25, 2022.
Prior Publication US 2024/0296333 A1, Sep. 5, 2024
Int. Cl. G06N 3/092 (2023.01); H04L 5/00 (2006.01)

CPC G06N 3/092 (2023.01) [H04L 5/0005 (2013.01); H04L 5/0037 (2013.01)]

2 Claims

1. A method for user grouping and resource allocation in a NOMA-MEC system, comprising:

step 1, constructing the NOMA-MEC system which operates in a time slot, wherein the NOMA-MEC system comprises K user devices and a single antenna base station connected to an edge server, and all users have only a single transmitting antenna to establish a communication link with the single antenna base station; and the NOMA-MEC system operates in the time slot, wherein the set of time slots is denoted as Γ={1, 2, . . . , T};

a process of constructing the NOMA-MEC system comprises the following steps;

dividing a total bandwidth W of the NOMA-MEC system into N orthogonal sub-channels n, wherein a ratio of a bandwidth of the sub-channel n to the total bandwidth is τ_n,

wherein K={1, 2, . . . , K} and N={1, 2, . . . , N} represent a set of users and a set of the orthogonal sub-channels, respectively, K≤2N;

dividing a whole process into the time slots Γ={1, 2, . . . , T}; a channel gain remains constant over a time period of one time slot and varies between different time slots, h_nk, n∈N, k∈K denotes the channel gain on the sub-channel n from a user k to the single antenna base station, and h_n1<h_n2< . . . <h_nK, n∈[1,N] is set; and

limiting one sub-channel only to transmit at most two user signals simultaneously, and the user transmitting signals on only one sub-channel in one time slot; wherein m_nk=1 indicates that the sub-channel n is allocated to the user k for transmitting signals, m_nk=0 indicates that the sub-channel n is not allocated to the user k for transmitting signals;

step 2, defining an energy efficiency of the NOMA-MEC system, wherein the defining the energy efficiency of the NOMA-MEC system comprises:

step 2.1) the energy efficiency Y of the NOMA-MEC system is defined as a sum of a ratio of a computation rate to a computation power of all users, as in a following formula:

wherein, R_i,offdenotes a computation rate at which a user i offloads computational tasks to the edge server for execution, p_iis a transmitting power of the user i, and does not vary over time and is the same for all users; R_i,localdenotes a computation rate at which the user i locally executes a task, p_i,localdenotes a power at which the user i locally executes the task, x_ni=1 denotes that the user i offloads the task through the sub-channel n to the edge server for execution, x_ni=0 denotes that the user i does not offload the task through the sub-channel n to the edge server for execution;

step 2.2) because a channel gain h_niof the user i on the sub-channel n is greater than a channel gain h_njof an user j; the single antenna base station decodes in descending orders according to a serial interference cancellation technique, an offload rate of the user i is

an offload rate of the user j is

where N₀is a power spectral density of noise; and

step 2.3) computation rates of the user i and the user j locally executed are

respectively, where f_iand f_jare a CPU processing power of the user, and Ø is a number of cycles required to process a 1bit task; computation powers of the user i and the user j locally executed are p_i,local=vf_i³and p_j,local=vf_j³, respectively, where v is a capacitive effective factor of a chip architecture of a user device;

step 3, constructing an optimization problem; wherein the optimization problem is constructed as:

step 4, defining a state space for deep reinforcement learning, and an action space for the deep reinforcement learning; wherein the defining the state space for deep reinforcement learning, and the action space for deep reinforcement learning comprises:

step 4.1) the state space s, s={h₁₁, h₁₂, . . . h_1K, h₂₁, h₂₂, . . . , h_2K, h_N1. . . h_NK};

step 4.2) the action space a comprises two phases a={a_c, a_d}, wherein a_c={τ₁, τ₂, . . . , τ_N} is a continuous action which indicates a bandwidth allocation ratio of the NOMA-MEC system, and a_d={m₁₁, m₁₂. . . , m_1K, . . . , m_N1, m_N2, . . . , m_NK, x₁₁, x₁₂, . . . , x_1K, . . . , x_N1, x_N2, . . . , x_NK} is a discrete action which indicates an allocation scheme of the sub-channel n;

step 5, constructing a hybrid deep reinforcement learning network, wherein an input of the hybrid deep reinforcement learning network is a state and an output of the hybrid deep reinforcement learning network is an action;

wherein constructing the hybrid deep reinforcement learning network comprises: the hybrid deep reinforcement network comprises a continuous layer deep reinforcement learning network and a discrete layer deep reinforcement learning network; the continuous layer deep reinforcement learning network is a deep deterministic policy gradient (DDPG), and the discrete layer deep reinforcement learning network is a deep Q-Network (DQN);

step 6, inputting a state of each of the time slots into the hybrid deep reinforcement learning network to generate the action, wherein the inputting the state of each of the time slots into the hybrid deep reinforcement learning network to generate the action comprises:

step 6.1) inputting the state of the NOMA-MEC system into the hybrid deep reinforcement learning network, generating the bandwidth allocation ratio a_c by an Actor network of the DDPG, and generating a user grouping situation a_d by the DQN network; and

step 6.2) after the user grouping situation and the bandwidth allocation ratio are determined, a maximum energy efficiency of the NOMA-MEC system is decomposed into a maximum energy efficiency of each sub-channel;

the problem is transformed into

where a matrix X is initialized as a zero matrix at each time; (x_n,i, x_n,j) has four possible values, (0, 0), (1, 0), (0, 1), (1, 1), where a value of x is to determine an offloading decision, where 0 means that no computation task of a user device is offloaded to the edge server for execution, and 1 means that offloading to the edge server for execution, and the four possible values is inputted into above formula to select one which makes Y_nthe maximum, and a value of a position corresponding to the matrix X is reset;

step 7, training the hybrid deep reinforcement learning network, wherein the training the hybrid deep reinforcement learning network comprises:

the base station in a state s receives an immediate reward

feedback from environment after performing the action a={a_c, a_d}, and obtaining a state s′ of a next time slot;

storing a sample (s,a_c,r,s′) to a DDPG experience pool, and storing a sample (s,a_d,r,s′) to a DQN experience pool, and the DDPG network and the DQN network share the state and the reward value; and

training the DDPG network and the DQN network to update their own parameters by sampling D samples from the experience pool;

step 8, repeating steps 6 and 7 until a number of repetitions reaches a specified number of time slots T, and outputting a decision to be optimized comprising the user grouping, calculation of offloading, and a bandwidth allocation ratio at this time.