US 11,856,524 B2
Systems and methods for deep reinforcement learning for optimal power control in wireless networks
Ziad El-Jamous, Rockville, MD (US); Yalin Evren Sagduyu, Gaithersburg, MD (US); and Kemal Davaslioglu, Ellicott City, MD (US)
Assigned to INTELLIGENT AUTOMATION, LLC, Rockville, MD (US)
Filed by INTELLIGENT AUTOMATION, LLC, Rockville, MD (US)
Filed on Dec. 7, 2021, as Appl. No. 17/544,649.
Prior Publication US 2023/0180138 A1, Jun. 8, 2023
Int. Cl. H04W 52/22 (2009.01); H04W 52/06 (2009.01)
CPC H04W 52/223 (2013.01) [H04W 52/06 (2013.01)] 33 Claims
OG exemplary drawing
 
1. A wireless network comprising a plurality of nodes, each respective node comprising:
a respective transceiver having a respective transmit power level and configured to communicate, and
a respective processor configured to:
obtain, as current state data, a time t1, the respective transmit power level at the time t1 of the respective transceiver, a respective link quality at the time t1 associated with the respective transceiver, and a respective signal strength at the time t1 associated with the respective transceiver,
calculate, as a reward, a measure pertaining to energy efficiency at the time t1 based at least on (i) a throughput at the time t1 based at least on the respective link quality at the time t1 of the respective transceiver and the respective signal strength at the time t1 associated with the respective transceiver, and (ii) an amount of energy consumption at the time t1 based at least on the respective transmit power level at the time t1 associated with the respective transceiver,
store, in a memory associated with the processor, in a collected parameters database:
(i) the current state data,
(ii) an identification of the respective node,
(iii) the reward, and
(iv) previously obtained state data, from a time t0 previous to the time t1, comprising a respective transmit power level at the time t0 of the respective transceiver, a respective link quality at the time t0 associated with the respective transceiver, and a respective signal strength at the time t0 associated with the respective transceiver,
provide the current state data and data from the collected parameters database to a reinforced neural network,
select an action, using the reinforced neural network, from amongst a set of possible actions relating to the transmit power level of the respective transceiver, based at least on the current state data and the data from the collected parameters database, and
output the selected action, such that the respective transmit power level of the respective transceiver of the associated node is selectively modified in accordance with the selected action.