US 11,963,047 B2
Link change decision-making using reinforcement learning based on tracked rewards and outcomes in a wireless communication system
Athanasios Karapantelakis, Solna (SE); Elena Fersman, Stockholm (SE); Rafia Inam, Västerås (SE); Markus Andersson, Boden (SE); and David Lindero, Luleå (SE)
Assigned to Telefonaktiebolaget LM Ericsson (publ), Stockholm (SE)
Appl. No. 17/286,065
Filed by Telefonaktiebolaget LM Ericsson (publ), Stockholm (SE)
PCT Filed Oct. 18, 2018, PCT No. PCT/EP2018/078509
§ 371(c)(1), (2) Date Apr. 16, 2021,
PCT Pub. No. WO2020/078552, PCT Pub. Date Apr. 23, 2020.
Prior Publication US 2021/0377822 A1, Dec. 2, 2021
Int. Cl. H04W 36/00 (2009.01); G06N 20/00 (2019.01); H04W 36/16 (2009.01); H04W 36/30 (2009.01)
CPC H04W 36/0058 (2018.08) [G06N 20/00 (2019.01); H04W 36/00837 (2018.08); H04W 36/165 (2013.01); H04W 36/30 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A method for link change decision-making using reinforcement learning, the method comprising:
tracking rewards earned for, and outcomes of, respective link change decisions, wherein possible outcomes of a link change decision to change a serving link of a wireless device to a target link include at least (i) a change of the serving link of the wireless device from the target link to another link and (ii) a network-initiated disconnect of the wireless device from the target link, wherein tracking rewards earned for, and outcomes of, respective link change decisions comprises, for each link change decision to change a serving link of a wireless device to a target link, tracking:
an outcome of the link change decision reflecting how the wireless device became detached or disconnected from the target link after the serving link of the wireless device was changed to the target link according to link change the decision; and
a reward earned for the link change decision characterizing an extent to which the link change decision was deemed successful up until the outcome of the link change decision occurred and/or as measured over a time period during which the wireless device was connected to, camped on, or attached via the target link; and
making a link change decision based on the tracked rewards and outcomes.