US 11,751,115 B2
	Hierarchical policy learning for hybrid communication load balancing
Jikun Kang, Montreal (CA); Xi Chen, Montreal (CA); Di Wu, Montreal (CA); Yi Tian Xu, Montreal (CA); Xue Liu, Montreal (CA); Gregory Lewis Dudek, Westmount (CA); Taeseop Lee, Seoul (KR); and Intaik Park, Seoul (KR)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Jun. 30, 2021, as Appl. No. 17/363,918.
Claims priority of provisional application 63/110,515, filed on Nov. 6, 2020.
Prior Publication US 2022/0150786 A1, May 12, 2022
Int. Cl. H04W 36/22 (2009.01); G06N 3/08 (2023.01); H04W 24/02 (2009.01); G06N 3/044 (2023.01); H04W 28/086 (2023.01)

CPC H04W 36/22 (2013.01) [G06N 3/044 (2023.01); G06N 3/08 (2013.01); H04W 24/02 (2013.01); H04W 28/0861 (2023.05)]

20 Claims

1. A server method by a server for a hierarchical dual reinforcement learning process in a communication system, wherein the communication system includes a first base station and a second base station, the server method comprising:

updating a higher policy based on a system set of parameters and a second loss to improve a higher-level reward at a first time;

selecting a second action based on the higher policy;

selecting a goal for a lower policy based on the system set of parameters and based on the second action;

updating the lower policy based on the system set of parameters and based on a first loss to improve a lower-level reward at the first time;

selecting a first action based on the goal and based on the first loss; and

applying a united action to the communication system, wherein the united action is based on the first action and the second action.