CPC G06N 3/084 (2013.01) [G06N 3/044 (2023.01); G06N 3/045 (2023.01)] | 9 Claims |
1. A machine learning system for embedding attributed sequence data comprising an attribute data part having a fixed number of attribute data elements and a sequence data part having a variable number of sequence data elements into a fixed-length feature representation for a fraud detection system, wherein the machine learning system comprises a multilayer feedforward neural network having an attribute data input layer and an attribute vector output layer which comprises a first predetermined number of units, operatively coupled to a long short-term memory (LSTM) network which comprises a second predetermined number of hidden units which is equal to the first predetermined number of units, wherein an output of the attribute vector output layer is operatively coupled to an input of an attribute vector input layer of the LSTM network, and wherein the attribute vector input layer of the LSTM network comprises a hidden state of the LSTM network at a first evaluation step, the machine learning system comprising:
a computing device; and
a computer-readable storage medium comprising a set of instructions that upon execution by the computing device cause the machine learning system to:
obtain a dataset comprising a plurality of attributed sequences based on user behavior associated with user actions; and
for each attributed sequence in the dataset,
train the multilayer feedforward neural network using the attribute data part of the attributed sequence via back-propagation with respect to a first objective function, and
train the LSTM network using the sequence data part of the attributed sequence via back-propagation with respect to a second objective function,
wherein training of the multilayer feedforward neural network is coupled with training the LSTM network such that, in use, the machine learning system is configured to:
identify common behaviors based on clusters in points in feature space;
determine a fixed-length feature representation of input attributed sequence data based on an analysis of the user behavior associated with the user actions within the fraud detection system that includes the identified common behaviors, wherein the fixed-length feature representation of input attributed sequence data comprises the hidden state of the LSTM network at a final evaluation step;
identify potential fraudulent behaviors based on isolated points within the fixed-length feature representation; and
output the fixed-length feature representation which encodes: i) dependencies between different attribute data elements in the attribute data part, ii) dependencies between different sequence data elements in the sequence data part, and iii) dependencies between attribute data elements and sequence data elements within the attributed sequence data.
|