US 12,278,841 B2
	Secure exploration for reinforcement learning
Harm Hendrik Van Seijen, Montreal (CA); and Seyed Mehdi Fatemi Booshehri, Montreal (CA)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on Feb. 17, 2023, as Appl. No. 18/111,155.
Application 18/111,155 is a continuation of application No. 16/554,525, filed on Aug. 28, 2019, granted, now 11,616,813.
Claims priority of provisional application 62/725,981, filed on Aug. 31, 2018.
Prior Publication US 2023/0199031 A1, Jun. 22, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 9/40 (2022.01); G06N 5/043 (2023.01); G06N 20/00 (2019.01)

CPC H04L 63/20 (2013.01) [G06N 5/043 (2013.01); G06N 20/00 (2019.01)]

20 Claims

1. A computer implemented method comprising:

determining a current state of an agent within an environment of a decision process that models a performance of a task;

determining a plurality of actions based on the decision process, wherein each of the plurality of actions is available for execution at the current state of the agent;

employing a secured policy to select a first action of the plurality of actions, wherein the secured policy provides a score for each of the plurality of actions that is based on a probability that performing the action at the current state of the agent will transition the current state of the agent to a dead-end state of the agent, wherein each of a plurality of possible trajectories from the dead-end state terminates at an undesired terminal state of the agent; and

executing the first action for the performance of the task.