US 12,462,251 B2
	Automatic modification of transaction constraints
Govind Gopinathan Nair, Jersey City, NJ (US); Mohini Shrivastava, Bhopal (IN); Saurabh Arora, Athens, GA (US); and Jason P. Somrak, North Royalton, OH (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US); and Oracle Financial Services Software Limited, Mumbai (IN)
Filed by ORACLE FINANCIAL SERVICES SOFTWARE LIMITED, Mumbai (IN); and Oracle International Corporation, Redwood Shores, CA (US)
Filed on Jun. 10, 2022, as Appl. No. 17/837,188.
Prior Publication US 2023/0401578 A1, Dec. 14, 2023
Int. Cl. G06Q 20/40 (2012.01); G06F 18/21 (2023.01); G06N 20/00 (2019.01)

CPC G06Q 20/401 (2013.01) [G06F 18/217 (2023.01); G06N 20/00 (2019.01); G06Q 20/405 (2013.01)]

20 Claims

1. A computer-implemented method, comprising:

training a reinforcement learning agent to learn an electronic policy data structure that maps states of an electronic transaction system to actions by the reinforcement learning agent that causes the reinforcement learning agent, when executed by one or more processors, to perform a task in the electronic transaction system without triggering an alert in scenarios of a monitoring system that are configured to hinder the task;

during the training of the reinforcement learning agent, iteratively:

(i) executing the reinforcement learning agent using the one or more processors to perform a transaction by the reinforcement learning agent that attempts to evade the scenarios of the monitoring system, wherein the reinforcement learning agent performs the transaction by:

(a) electronically retrieving a current state of variables that are evaluated by the scenarios from a variables electronic data structure in memory,

(b) electronically retrieving a mapping of the current state to the transaction from the electronic policy data structure, and

(ii) executing the scenarios by the monitoring system to evaluate whether the current state of variables triggers alerts under the scenarios,

(iii) generating and storing electronic records of the transaction individually in a records electronic data structure in memory that includes values for type of transaction channel and alert statuses for the scenarios, and

(iv) adjusting the electronic policy data structure to increase a reward function that provides a relatively larger penalty when one of the scenarios is triggered, a relatively smaller penalty for performing the transaction, and a reward for completing the task;

determining a usage frequency for a transaction channel in a subset of the transactions in which the attempts to evade the one or more scenarios are successful based on transactions for the transaction channel that are recorded in the records electronic data structure with no alerts for the scenarios, wherein the usage frequency measures how often the reinforcement learning agent uses the transaction channel to successfully evade the scenarios;

comparing the usage frequency to an expected usage frequency for the transaction channel;

automatically modifying, in the scenarios, a transaction constraint on the transaction channel to prevent, by the transaction monitoring system, use of the transaction channel in a manner described by the electronic policy data structure;

automatically deploying the transaction constraint into the scenarios that are used to monitor a live transaction environment; and

monitoring the live transaction environment based on the transaction constraint.