US 12,333,027 B2
	Machine learning for implementing policies in a computing environment
Johnson Manuel-Devadoss, San Antonio, TX (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Mar. 3, 2022, as Appl. No. 17/685,561.
Prior Publication US 2023/0283643 A1, Sep. 7, 2023
Int. Cl. G06F 21/60 (2013.01); G06F 18/21 (2023.01); G06N 3/092 (2023.01); H04L 9/40 (2022.01)

CPC G06F 21/604 (2013.01) [G06F 18/2185 (2023.01); G06N 3/092 (2023.01); H04L 63/20 (2013.01)]

14 Claims

1. A non-transitory computer readable medium comprising instructions which, when executed by one or more hardware processors cause performance of operations comprising:

obtaining a set of data representing a computing environment, the computing environment comprising a first set of computing resources;

receiving a selection to implement a first set of data security standards in the computing environment;

identifying a first set of security requirements, associated with the first set of data security standards, for the first set of computing resources;

performing deep reinforcement learning to generate a target sequence of actions for implementing the first set of data security standards in the computing environment, at least by:

iteratively (a) applying a plurality of candidate actions, associated with a respective plurality of security requirements, from among the first set of security requirements, to a current candidate state of the computing environment to generate a plurality of next candidate states of the computing environment, (b) identifying a plurality of candidate reward values associated respectively with the plurality of next candidate states of the computing environment, and (c) applying a target action, from among the plurality of candidate actions, to the current candidate state to generate a target next candidate state of the computing environment and a target reward associated with the target next candidate state; and

generating the target sequence of actions based on a set of target rewards associated with a particular set of target next candidate states resulting from performing the target sequence of actions; and

executing the target sequence of actions to provision the computing environment according to the first set of data security standards.