CPC H04W 36/22 (2013.01) [G06N 3/044 (2023.01); G06N 3/08 (2013.01); H04W 24/02 (2013.01); H04W 28/0861 (2023.05)] | 20 Claims |
1. A server method by a server for a hierarchical dual reinforcement learning process in a communication system, wherein the communication system includes a first base station and a second base station, the server method comprising:
updating a higher policy based on a system set of parameters and a second loss to improve a higher-level reward at a first time;
selecting a second action based on the higher policy;
selecting a goal for a lower policy based on the system set of parameters and based on the second action;
updating the lower policy based on the system set of parameters and based on a first loss to improve a lower-level reward at the first time;
selecting a first action based on the goal and based on the first loss; and
applying a united action to the communication system, wherein the united action is based on the first action and the second action.
|