US 12,408,079 B2
	Apparatus, system, method and computer-implemented storage media to implement radio resource management policies using machine learning
Ravikumar Balakrishnan, Beaverton, OR (US); Nageen Himayat, Fremont, CA (US); Arjun Anand, Milpitas, CA (US); Mustafa Riza Akdeniz, San Jose, CA (US); Sagar Dhakal, Los Altos, CA (US); Mark R. Eisen, Beaverton, OR (US); and Navid Naderializadeh, Woodland Hills, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Apr. 1, 2022, as Appl. No. 17/712,050.
Prior Publication US 2022/0377614 A1, Nov. 24, 2022
Int. Cl. H04W 28/08 (2023.01); H04W 28/02 (2009.01)

CPC H04W 28/0925 (2020.05) [H04W 28/0252 (2013.01); H04W 28/0268 (2013.01)]

25 Claims

1. An apparatus of a transmitter computing node n (TX node n) of a wireless network, the apparatus including an interconnect interface to connect the apparatus to one or more components of the TX node n, and one or more processors to: implement machine learning (ML) based training rounds, each training round including:

determining, using reinforcement learning, a local action value function Q_n(h_n, a_n; θ_n) corresponding to a value of performing a radio resource management (RRM) action an at a receiving computing node n (RX node n) associated with TX node n using policy parameter θ_nand based on h_n, h_nincluding channel state information at RX node n; and

determining, based on an overall action value function Q_totat time t, an estimated gradient of an overall loss at time t for overall policy parameter θ_t(∇L_t(θ_t)) wherein Q_totcorresponds to a mixing of local action value functions Q_i(h_i, a_i; θ_i) for all TX nodes i in the network at time t including TX node n; and

determine, in response to a determination that ∇L_t(θ_t) is close to zero for various values of t during training, a trained local action value function Q_n,trainedto generate a trained action value relating to data communication between TX node n and RX node n.