US 12,244,470 B2
	Machine-learning models and apparatus
Jean Paulo Martins, Indaiatuba (BR); Ricardo da Silva Souza, Indaiatuba (BR); Klaus Raizer, Indaiatuba (BR); Alberto Hata, Campinas SP (BR); and Amadeu Do Nascimento Junior, Indaiatuba (BR)
Assigned to Telefonaktiebolaget LM Ericsson (Publ), Stockholm (SE)
Appl. No. 18/252,765
Filed by Telefonaktiebolaget LM Ericsson (publ), Stockholm (SE)
PCT Filed Nov. 13, 2020, PCT No. PCT/EP2020/082139 § 371(c)(1), (2) Date May 12, 2023, PCT Pub. No. WO2022/100856, PCT Pub. Date May 19, 2022.
Prior Publication US 2024/0007359 A1, Jan. 4, 2024
Int. Cl. G06F 15/173 (2006.01); G06N 20/00 (2019.01); H04L 41/14 (2022.01); H04L 41/16 (2022.01)

CPC H04L 41/16 (2013.01) [G06N 20/00 (2019.01); H04L 41/145 (2013.01)]

25 Claims

1. A method of operation of a client node for implementing reinforcement learning, RL, the client node instructing actions in an environment in accordance with a policy, the method comprising:

identifying one or more critical states of the environment for which a current policy provides unreliable actions, the unreliable actions being actions that do not have predefined effects on the environment;

initiating transmission to a server of a retraining request, the retraining request comprising information relating to the one or more critical states, the server retraining a machine learning system, MLS, responsible for the current policy, based on the information relating to the one or more critical states in the retraining request, to generate a new policy;

receiving from the server the new policy, the new policy being generated by the server using RL based on the information relating to the one or more critical states; and

instructing actions in the environment in accordance with the new policy.