US 11,871,251 B2
Method of association of user equipment in a cellular network according to a transferable association policy
Mohamed Sana, Grenoble (FR); Nicola Di Pietro, Grenoble (FR); Emilio Calvanese Strinati, Grenoble (FR); and Benoît Miscopein, Grenoble (FR)
Assigned to COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, Paris (FR)
Filed by COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, Paris (FR)
Filed on Sep. 29, 2021, as Appl. No. 17/449,337.
Claims priority of application No. 20 09989 (FR), filed on Sep. 30, 2020.
Prior Publication US 2022/0104034 A1, Mar. 31, 2022
Int. Cl. H04W 24/02 (2009.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); H04L 5/00 (2006.01)
CPC H04W 24/02 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01); H04L 5/006 (2013.01); H04L 5/0048 (2013.01); H04L 5/0055 (2013.01); H04L 5/0078 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A method for associating user equipment with base stations of a cellular network, the association method implementing a multi-agent reinforcement learning method, each user equipment being represented by an agent, comprising:
when a user equipment uj enters the network, downloading an instance of a meta model (πw,0) to initialise an association strategy model (πw,j) using the agent representing the user equipment, the meta model having an architecture independent of the user;
constructing a local observation vector (olj(t)) comprising observables relating to the equipment and a global observation vector (ogj(t)) comprising observables relating to an environment of the equipment, using the agent representing the user equipment;
projecting the local observation vector and the global observation vector into the same reference space and combining the two vectors thus projected to provide a code (cj(t)) of a state (sj(t)) of the agent, using the agent representing the user equipment;
updating the association strategy model by means of a policy gradient method, using the agent representing the user equipment;
deciding on the base station with which to associate and receiving in return a common reward calculated from a utility function of the network, using the agent representing the user equipment.