| CPC H04W 36/0033 (2013.01) [H04W 36/00835 (2018.08); H04W 36/008375 (2023.05); H04W 36/322 (2023.05); H04W 36/324 (2023.05)] | 19 Claims | 

| 
               1. A method implemented by a network node in a wireless communication network comprising multiple network nodes, the method comprising: 
            selecting, based on current radio environment conditions experienced by a wireless device, a radio environment context from a plurality of stored radio environment contexts indicative of radio conditions at respective locations in the wireless communication network, the selecting of the radio environment context being based on minimizing, using a Q-table comprising the plurality of stored radio environment contexts, a distance metric between the current radio environment conditions and each of the stored radio environment contexts in the Q-table, the radio environment context being a closest one of the stored radio environment contexts; 
                selecting a target network node for a handover using a mapping between the selected radio environment context and one or more candidate network nodes that are predicted, based on historical measurement data collected over time from a plurality of wireless devices, to provide the highest expected rewards in terms of a performance metric given the selected radio environment context; and 
                causing the handover of the wireless device to the target network node. 
               |