CPC H04L 9/008 (2013.01) [G06N 3/08 (2013.01)] | 7 Claims |
1. A processor implemented method for classifying a website uniform resource locator (URL), comprising:
providing the website URL as an input URL to a client machine, wherein the input URL is kept at a predefined character length, wherein a split architecture is implemented that includes the client machine and a server;
extracting, by the client machine, a feature vector from the input URL;
encrypting, by the client machine, the feature vector, wherein the encryption results in generation of a representative encrypted feature vector of the input URL, wherein the client machine shares a public key of the client machine to the server and retains a private key of the client machine as a secret key, wherein a plurality of encryptions is performed using the public key of the client machine;
sending, by the client machine, the representative encrypted feature vector of the input URL to the server;
computing, by the server, encrypted intermediate computation results by applying:
at least one of a logistic regression (LR) based technique or a hybrid of a deep neural network (DNN) based technique and the LR based technique on the representative encrypted feature vector using polynomial approximation of an activation function, wherein the hybrid of the DNN based technique and the LR based techniques comprises:
extracting, by the server, a plurality of features using an embedding matrix and a convolution layer from a URLNet; and
training, by the server, an LR based classifier using the extracted plurality of features, wherein
the DNN based technique further comprising computing encrypted intermediate computation result of a feed forward network of the deep neural network (DNN) on encrypted feature vectors of the input URL using polynomial approximation of an activation function, and
the activation function includes at least one of a Hyperbolic Tangent Function (Tan h), a leaky Rectified Linear Unit (ReLU), and a Softmax function, wherein
the logistic regression based technique is using a Sigmoid function,
the sigmoid function is computed as 0.5+0.253551x−0.00148x2−0.02587x3+0.00158x4+0.00371x,5
‘x’ is an output of the DNN,
the logistic regression based feature extraction method further comprising:
identifying unique n-grams in the input URL out of the possible n-grams, wherein the unique n-grams are considered as a bag of n-grams with a predefined length; and
dividing the input URL into n-grams, wherein each input URL is mapped to generate the feature vector of the predefined length, wherein the mapping is done by indicating presence or absence of the n-gram by using 1 or 0 respectively;
sending, by the server, the encrypted intermediate computation results to the client machine;
decrypting, by the client machine, the encrypted intermediate computation results using a fully homomorphic encryption (FHE) method, wherein a plurality of decryptions is performed at the client machine using a private key of the client machine; and
classifying, by the server and the client machine, the input URL as one of malicious or benign based on the mapping and the decrypted intermediate computation results.
|