CPC H04W 28/0925 (2020.05) [H04W 28/0252 (2013.01); H04W 28/0268 (2013.01)] | 25 Claims |
1. An apparatus of a transmitter computing node n (TX node n) of a wireless network, the apparatus including an interconnect interface to connect the apparatus to one or more components of the TX node n, and one or more processors to: implement machine learning (ML) based training rounds, each training round including:
determining, using reinforcement learning, a local action value function Qn(hn, an; θn) corresponding to a value of performing a radio resource management (RRM) action an at a receiving computing node n (RX node n) associated with TX node n using policy parameter θn and based on hn, hn including channel state information at RX node n; and
determining, based on an overall action value function Qtot at time t, an estimated gradient of an overall loss at time t for overall policy parameter θt(∇Lt(θt)) wherein Qtot corresponds to a mixing of local action value functions Qi(hi, ai; θi) for all TX nodes i in the network at time t including TX node n; and
determine, in response to a determination that ∇Lt(θt) is close to zero for various values of t during training, a trained local action value function Qn,trained to generate a trained action value relating to data communication between TX node n and RX node n.
|