| CPC H04L 41/16 (2013.01) [G06N 20/00 (2019.01); H04L 41/145 (2013.01)] | 25 Claims |

|
1. A method of operation of a client node for implementing reinforcement learning, RL, the client node instructing actions in an environment in accordance with a policy, the method comprising:
identifying one or more critical states of the environment for which a current policy provides unreliable actions, the unreliable actions being actions that do not have predefined effects on the environment;
initiating transmission to a server of a retraining request, the retraining request comprising information relating to the one or more critical states, the server retraining a machine learning system, MLS, responsible for the current policy, based on the information relating to the one or more critical states in the retraining request, to generate a new policy;
receiving from the server the new policy, the new policy being generated by the server using RL based on the information relating to the one or more critical states; and
instructing actions in the environment in accordance with the new policy.
|