| CPC G06N 20/00 (2019.01) [G06F 18/24147 (2023.01)] | 15 Claims |

|
1. A method, comprising:
obtaining, with at least one processor, a training dataset Xtrain including a plurality of source samples including a plurality of labeled normal samples and a plurality of labeled anomaly samples;
executing, with the at least one processor, a training episode by:
(i) initializing a timestamp t;
(ii) receiving, from an actor network π of an actor critic framework including the actor network π and a critic network Q, an action vector at for the timestamp t, wherein the actor network π is configured to generate the action vector at based on a state st, wherein the state st is determined based on a current pair of source samples of the plurality of source samples, and wherein the action vector at includes a size of a nearest neighborhood k, a composition ratio α, a number of oversampling n, and a termination probability ∈;
(iii) combining the current pair of source samples according to the composition ratio α and the number of oversampling n to generate a labeled synthetic sample xsyn associated with a label ysyn;
(iv) training, using the labeled synthetic sample xsyn and the label ysyn, a machine learning classifier ϕ;
(v) obtaining, based on the size of a nearest neighborhood k, source samples in the k-nearest neighborhood of the labeled synthetic sample xsyn;
(vi) generating, with the machine learning classifier ϕ, for the source samples in the k-nearest neighborhood of the labeled synthetic sample xsyn and a subset of the plurality of source samples of the training dataset Xtrain in a validation dataset Xval, a plurality of classifier outputs;
(vii) selecting, from the source samples in the k-nearest neighborhood of the labeled synthetic sample xsyn, a next pair of source samples;
(viii) storing, in a memory buffer, the state st, the action vector at, a next state st+1, and a reward rt, wherein the next state st+1 is determined based on the next pair of source samples, and wherein the reward rt is determined based on the plurality of classifier outputs;
(ix) determining whether the termination probability ∈ satisfies a termination threshold;
(x) in response to determining that the termination probability ∈ fails to satisfy the termination threshold, incrementing the timestamp t, for a number of training steps S:
training the critic network Q according to a critic loss function that depends on the state st, the action vector at, and the reward rt; and
training the actor network π according to an actor loss function that depends on an output of the critic network, and
after training the actor network π and the critic network Q for the number of training steps S, returning to step (ii) with the next pair of source samples as the current pair of source samples;
(xi) in response to determining that the termination probability ∈ satisfies the termination threshold, determining whether the number of training episodes executed satisfies a threshold number of training episodes;
(xii) in response to determining that the number of training episodes executed fails to satisfy the threshold number of training episodes, return to step (i) to execute a next training episode; and
(xiii) in response to determining that the number of training episodes executed satisfies the threshold number of training episodes, provide the machine learning classifier ϕ, wherein the plurality of source samples is associated with a plurality of transactions in a transaction processing network, wherein the plurality of labeled normal samples is associated with a plurality of non-fraudulent transactions of the plurality of transactions, and wherein the plurality of labeled anomaly samples is associated with a plurality of fraudulent transactions of the plurality of transactions;
receiving, with the at least one processor, transaction data associated with a transaction currently being processed in the transaction processing network:
processing, with the at least one processor, using the trained machine learning classifier ϕ, the transaction data to classify the transaction as a fraudulent or non-fraudulent transaction; and
in response to classifying the transaction as a fraudulent transaction, denying, with the at least one processor, authorization of the transaction in the transaction processing network.
|