US 11,871,251 B2
	Method of association of user equipment in a cellular network according to a transferable association policy
Mohamed Sana, Grenoble (FR); Nicola Di Pietro, Grenoble (FR); Emilio Calvanese Strinati, Grenoble (FR); and Benoît Miscopein, Grenoble (FR)
Assigned to COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, Paris (FR)
Filed by COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, Paris (FR)
Filed on Sep. 29, 2021, as Appl. No. 17/449,337.
Claims priority of application No. 20 09989 (FR), filed on Sep. 30, 2020.
Prior Publication US 2022/0104034 A1, Mar. 31, 2022
Int. Cl. H04W 24/02 (2009.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); H04L 5/00 (2006.01)

CPC H04W 24/02 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01); H04L 5/006 (2013.01); H04L 5/0048 (2013.01); H04L 5/0055 (2013.01); H04L 5/0078 (2013.01)]

12 Claims

1. A method for associating user equipment with base stations of a cellular network, the association method implementing a multi-agent reinforcement learning method, each user equipment being represented by an agent, comprising:

when a user equipment u_jenters the network, downloading an instance of a meta model (π_w,0) to initialise an association strategy model (π_w,j) using the agent representing the user equipment, the meta model having an architecture independent of the user;

constructing a local observation vector (o^l_j(t)) comprising observables relating to the equipment and a global observation vector (o^g_j(t)) comprising observables relating to an environment of the equipment, using the agent representing the user equipment;

projecting the local observation vector and the global observation vector into the same reference space and combining the two vectors thus projected to provide a code (c_j(t)) of a state (s_j(t)) of the agent, using the agent representing the user equipment;

updating the association strategy model by means of a policy gradient method, using the agent representing the user equipment;

deciding on the base station with which to associate and receiving in return a common reward calculated from a utility function of the network, using the agent representing the user equipment.