CPC H04B 7/0456 (2013.01) [G06N 3/08 (2013.01)] | 20 Claims |
1. A method, performed by one or more first devices, for providing a precoder selection policy for a multi-antenna transmitter to transmit data over a communication channel of a wireless communication network, the method comprising:
applying machine learning in a form of reinforcement learning involving adaptation of an action value function configured to compute an action value based on action information and state information, where action information is information indicative of a precoder of the multi-antenna transmitter and state information is information indicative of a state relating to at least the communication channel, the adaptation of the action value function being further based on reward information provided by a reward function, where reward information is information indicative of how successfully data is transmitted over the communication channel, the action information relating to an identifier identifying a precoder of a predefined set of precoders; and
after the training based on reinforcement learning, providing the adapted action value function and using post training for selecting the precoder for the multi-antenna transmitter.
|