| CPC G06N 3/092 (2023.01) [H04L 5/0005 (2013.01); H04L 5/0037 (2013.01)] | 2 Claims |

|
1. A method for user grouping and resource allocation in a NOMA-MEC system, comprising:
step 1, constructing the NOMA-MEC system which operates in a time slot, wherein the NOMA-MEC system comprises K user devices and a single antenna base station connected to an edge server, and all users have only a single transmitting antenna to establish a communication link with the single antenna base station; and the NOMA-MEC system operates in the time slot, wherein the set of time slots is denoted as Γ={1, 2, . . . , T};
a process of constructing the NOMA-MEC system comprises the following steps;
dividing a total bandwidth W of the NOMA-MEC system into N orthogonal sub-channels n, wherein a ratio of a bandwidth of the sub-channel n to the total bandwidth is τn,
![]() wherein K={1, 2, . . . , K} and N={1, 2, . . . , N} represent a set of users and a set of the orthogonal sub-channels, respectively, K≤2N;
dividing a whole process into the time slots Γ={1, 2, . . . , T}; a channel gain remains constant over a time period of one time slot and varies between different time slots, hnk, n∈N, k∈K denotes the channel gain on the sub-channel n from a user k to the single antenna base station, and hn1<hn2< . . . <hnK, n∈[1,N] is set; and
limiting one sub-channel only to transmit at most two user signals simultaneously, and the user transmitting signals on only one sub-channel in one time slot; wherein mnk=1 indicates that the sub-channel n is allocated to the user k for transmitting signals, mnk=0 indicates that the sub-channel n is not allocated to the user k for transmitting signals;
step 2, defining an energy efficiency of the NOMA-MEC system, wherein the defining the energy efficiency of the NOMA-MEC system comprises:
step 2.1) the energy efficiency Y of the NOMA-MEC system is defined as a sum of a ratio of a computation rate to a computation power of all users, as in a following formula:
![]() wherein, Ri,off denotes a computation rate at which a user i offloads computational tasks to the edge server for execution, pi is a transmitting power of the user i, and does not vary over time and is the same for all users; Ri,local denotes a computation rate at which the user i locally executes a task, pi,local denotes a power at which the user i locally executes the task, xni=1 denotes that the user i offloads the task through the sub-channel n to the edge server for execution, xni=0 denotes that the user i does not offload the task through the sub-channel n to the edge server for execution;
step 2.2) because a channel gain hni of the user i on the sub-channel n is greater than a channel gain hnj of an user j; the single antenna base station decodes in descending orders according to a serial interference cancellation technique, an offload rate of the user i is
![]() an offload rate of the user j is
![]() where N0 is a power spectral density of noise; and
step 2.3) computation rates of the user i and the user j locally executed are
![]() respectively, where fi and fj are a CPU processing power of the user, and Ø is a number of cycles required to process a 1bit task; computation powers of the user i and the user j locally executed are pi,local=vfi3 and pj,local=vfj3, respectively, where v is a capacitive effective factor of a chip architecture of a user device;
step 3, constructing an optimization problem; wherein the optimization problem is constructed as:
![]() step 4, defining a state space for deep reinforcement learning, and an action space for the deep reinforcement learning; wherein the defining the state space for deep reinforcement learning, and the action space for deep reinforcement learning comprises:
step 4.1) the state space s, s={h11, h12, . . . h1K, h21, h22, . . . , h2K, hN1 . . . hNK};
step 4.2) the action space a comprises two phases a={a_c, a_d}, wherein a_c={τ1, τ2, . . . , τN} is a continuous action which indicates a bandwidth allocation ratio of the NOMA-MEC system, and a_d={m11, m12 . . . , m1K, . . . , mN1, mN2, . . . , mNK, x11, x12, . . . , x1K, . . . , xN1, xN2, . . . , xNK} is a discrete action which indicates an allocation scheme of the sub-channel n;
step 5, constructing a hybrid deep reinforcement learning network, wherein an input of the hybrid deep reinforcement learning network is a state and an output of the hybrid deep reinforcement learning network is an action;
wherein constructing the hybrid deep reinforcement learning network comprises: the hybrid deep reinforcement network comprises a continuous layer deep reinforcement learning network and a discrete layer deep reinforcement learning network; the continuous layer deep reinforcement learning network is a deep deterministic policy gradient (DDPG), and the discrete layer deep reinforcement learning network is a deep Q-Network (DQN);
step 6, inputting a state of each of the time slots into the hybrid deep reinforcement learning network to generate the action, wherein the inputting the state of each of the time slots into the hybrid deep reinforcement learning network to generate the action comprises:
step 6.1) inputting the state of the NOMA-MEC system into the hybrid deep reinforcement learning network, generating the bandwidth allocation ratio a_c by an Actor network of the DDPG, and generating a user grouping situation a_d by the DQN network; and
step 6.2) after the user grouping situation and the bandwidth allocation ratio are determined, a maximum energy efficiency of the NOMA-MEC system is decomposed into a maximum energy efficiency of each sub-channel;
the problem is transformed into
![]() where a matrix X is initialized as a zero matrix at each time; (xn,i, xn,j) has four possible values, (0, 0), (1, 0), (0, 1), (1, 1), where a value of x is to determine an offloading decision, where 0 means that no computation task of a user device is offloaded to the edge server for execution, and 1 means that offloading to the edge server for execution, and the four possible values is inputted into above formula to select one which makes Yn the maximum, and a value of a position corresponding to the matrix X is reset;
step 7, training the hybrid deep reinforcement learning network, wherein the training the hybrid deep reinforcement learning network comprises:
the base station in a state s receives an immediate reward
![]() feedback from environment after performing the action a={a_c, a_d}, and obtaining a state s′ of a next time slot;
storing a sample (s,a_c,r,s′) to a DDPG experience pool, and storing a sample (s,a_d,r,s′) to a DQN experience pool, and the DDPG network and the DQN network share the state and the reward value; and
training the DDPG network and the DQN network to update their own parameters by sampling D samples from the experience pool;
step 8, repeating steps 6 and 7 until a number of repetitions reaches a specified number of time slots T, and outputting a decision to be optimized comprising the user grouping, calculation of offloading, and a bandwidth allocation ratio at this time.
|