CPC H04W 24/02 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01); H04L 5/006 (2013.01); H04L 5/0048 (2013.01); H04L 5/0055 (2013.01); H04L 5/0078 (2013.01)] | 12 Claims |
1. A method for associating user equipment with base stations of a cellular network, the association method implementing a multi-agent reinforcement learning method, each user equipment being represented by an agent, comprising:
when a user equipment uj enters the network, downloading an instance of a meta model (πw,0) to initialise an association strategy model (πw,j) using the agent representing the user equipment, the meta model having an architecture independent of the user;
constructing a local observation vector (olj(t)) comprising observables relating to the equipment and a global observation vector (ogj(t)) comprising observables relating to an environment of the equipment, using the agent representing the user equipment;
projecting the local observation vector and the global observation vector into the same reference space and combining the two vectors thus projected to provide a code (cj(t)) of a state (sj(t)) of the agent, using the agent representing the user equipment;
updating the association strategy model by means of a policy gradient method, using the agent representing the user equipment;
deciding on the base station with which to associate and receiving in return a common reward calculated from a utility function of the network, using the agent representing the user equipment.
|