CPC H04L 63/1483 (2013.01) [H04L 41/16 (2013.01)] | 20 Claims |
1. A method implemented by processor circuitry for detecting email phishing using deep learning, the method comprising:
obtaining training email data including emails labeled as a phishing email or a non-phishing email, wherein:
the emails include hypertext markup language (HTML) tags having a hierarchal structure; and
the HTML tags include a tag name and a tag attribute;
generating training data including the training email data by, for each of the emails included in the training email data:
generating from the HTML tags of the email a directed acyclic graph (DAG) including nodes and an adjacency matrix, wherein:
each node represents an HTML tag of the email and includes the tag name of the represented HTML tag, the tag attribute, and a cascading style sheets (CSS) value; and
the CSS value is determined based on CSS properties of the email; and
determining the adjacency matrix of the DAG by parsing the HTML tags of the email to form a parse tree, wherein:
the adjacency matrix is a matrix of elements with each of the elements representing two of the nodes of the DAG as a node pair;
each of the elements has a value indicating whether the two nodes of the node pair represented by the element are adjacent or not adjacent; and
two nodes of a node pair are adjacent when both nodes are connected in the parse tree;
stochastically training one or more graph neural networks using the generated training data, such that when the one or more graph neural networks is applied to an email, the one or more graph neural networks outputs a label for the email as a phishing email or a non-phishing email, wherein:
noise is added during the stochastic training of the one or more graph neural networks, such that the trained one or more graph neural networks are more robust against small changes in the training data;
receiving a testing email having HTML tags; and
applying the one or more trained graph neural networks to the testing email and outputting the label for the testing email as a phishing email or a non-phishing email.
|