US 12,408,079 B2
Apparatus, system, method and computer-implemented storage media to implement radio resource management policies using machine learning
Ravikumar Balakrishnan, Beaverton, OR (US); Nageen Himayat, Fremont, CA (US); Arjun Anand, Milpitas, CA (US); Mustafa Riza Akdeniz, San Jose, CA (US); Sagar Dhakal, Los Altos, CA (US); Mark R. Eisen, Beaverton, OR (US); and Navid Naderializadeh, Woodland Hills, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Apr. 1, 2022, as Appl. No. 17/712,050.
Prior Publication US 2022/0377614 A1, Nov. 24, 2022
Int. Cl. H04W 28/08 (2023.01); H04W 28/02 (2009.01)
CPC H04W 28/0925 (2020.05) [H04W 28/0252 (2013.01); H04W 28/0268 (2013.01)] 25 Claims
OG exemplary drawing
 
1. An apparatus of a transmitter computing node n (TX node n) of a wireless network, the apparatus including an interconnect interface to connect the apparatus to one or more components of the TX node n, and one or more processors to: implement machine learning (ML) based training rounds, each training round including:
determining, using reinforcement learning, a local action value function Qn(hn, an; θn) corresponding to a value of performing a radio resource management (RRM) action an at a receiving computing node n (RX node n) associated with TX node n using policy parameter θn and based on hn, hn including channel state information at RX node n; and
determining, based on an overall action value function Qtot at time t, an estimated gradient of an overall loss at time t for overall policy parameter θt(∇Ltt)) wherein Qtot corresponds to a mixing of local action value functions Qi(hi, ai; θi) for all TX nodes i in the network at time t including TX node n; and
determine, in response to a determination that ∇Ltt) is close to zero for various values of t during training, a trained local action value function Qn,trained to generate a trained action value relating to data communication between TX node n and RX node n.