| CPC H04L 63/20 (2013.01) [G06N 5/043 (2013.01); G06N 20/00 (2019.01)] | 20 Claims |

|
1. A computer implemented method comprising:
determining a current state of an agent within an environment of a decision process that models a performance of a task;
determining a plurality of actions based on the decision process, wherein each of the plurality of actions is available for execution at the current state of the agent;
employing a secured policy to select a first action of the plurality of actions, wherein the secured policy provides a score for each of the plurality of actions that is based on a probability that performing the action at the current state of the agent will transition the current state of the agent to a dead-end state of the agent, wherein each of a plurality of possible trajectories from the dead-end state terminates at an undesired terminal state of the agent; and
executing the first action for the performance of the task.
|