| CPC G06F 21/604 (2013.01) [G06F 18/2185 (2023.01); G06N 3/092 (2023.01); H04L 63/20 (2013.01)] | 14 Claims |

|
1. A non-transitory computer readable medium comprising instructions which, when executed by one or more hardware processors cause performance of operations comprising:
obtaining a set of data representing a computing environment, the computing environment comprising a first set of computing resources;
receiving a selection to implement a first set of data security standards in the computing environment;
identifying a first set of security requirements, associated with the first set of data security standards, for the first set of computing resources;
performing deep reinforcement learning to generate a target sequence of actions for implementing the first set of data security standards in the computing environment, at least by:
iteratively (a) applying a plurality of candidate actions, associated with a respective plurality of security requirements, from among the first set of security requirements, to a current candidate state of the computing environment to generate a plurality of next candidate states of the computing environment, (b) identifying a plurality of candidate reward values associated respectively with the plurality of next candidate states of the computing environment, and (c) applying a target action, from among the plurality of candidate actions, to the current candidate state to generate a target next candidate state of the computing environment and a target reward associated with the target next candidate state; and
generating the target sequence of actions based on a set of target rewards associated with a particular set of target next candidate states resulting from performing the target sequence of actions; and
executing the target sequence of actions to provision the computing environment according to the first set of data security standards.
|