| CPC H04W 52/0206 (2013.01) [H04W 24/10 (2013.01); H04W 52/02 (2013.01)] | 11 Claims |

|
1. An apparatus comprising:
at least one processor; and
at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the apparatus at least to perform:
determining, for a group of cells in a radio access network, an optimal action as an output of a first trained model, using the first trained model, which is based on reinforcement learning and maximizes a long term reward on tradeoff between throughput and power saving within the group of cells, the first trained model taking as input a state, wherein the optimal action is one of actions comprising at least modifying power settings of one or more cells, switching on one or more cells, switching off one or more cells, and retaining the current cell statuses in cells of the group of cells, and wherein the state comprises at least one of a load estimate and, per a cell in the group of cells, a current cell status;
causing the optimal action to be performed in response to the optimal action being modifying power settings of one or more cells, or switching on one or more cells, or switching off one or more cells;
receiving load and performance metrics of cells that are switched on, and power consumed by the cells that are switched on; and
updating the first trained model in response to the receiving load and performance metrics of cells that are switched on, and power consumed by the cells that are switched on.
|