US 12,155,694 B2
Phishing detection using HTML
Natan Katz, Tel Aviv (IL); Raphael Lior Gozlan, Herzelia (IL); and Dor Livne, Tel Aviv (IL)
Assigned to Check Point Software Technologies Ltd., Tel Aviv (IL)
Filed by Check Point Software Technologies Ltd., Tel Aviv (IL)
Filed on Sep. 30, 2022, as Appl. No. 17/936,878.
Prior Publication US 2024/0114053 A1, Apr. 4, 2024
Int. Cl. G06F 21/00 (2013.01); H04L 9/40 (2022.01); H04L 29/06 (2006.01); H04L 41/16 (2022.01)
CPC H04L 63/1483 (2013.01) [H04L 41/16 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by processor circuitry for detecting email phishing using deep learning, the method comprising:
obtaining training email data including emails labeled as a phishing email or a non-phishing email, wherein:
the emails include hypertext markup language (HTML) tags having a hierarchal structure; and
the HTML tags include a tag name and a tag attribute;
generating training data including the training email data by, for each of the emails included in the training email data:
generating from the HTML tags of the email a directed acyclic graph (DAG) including nodes and an adjacency matrix, wherein:
each node represents an HTML tag of the email and includes the tag name of the represented HTML tag, the tag attribute, and a cascading style sheets (CSS) value; and
the CSS value is determined based on CSS properties of the email; and
determining the adjacency matrix of the DAG by parsing the HTML tags of the email to form a parse tree, wherein:
the adjacency matrix is a matrix of elements with each of the elements representing two of the nodes of the DAG as a node pair;
each of the elements has a value indicating whether the two nodes of the node pair represented by the element are adjacent or not adjacent; and
two nodes of a node pair are adjacent when both nodes are connected in the parse tree;
stochastically training one or more graph neural networks using the generated training data, such that when the one or more graph neural networks is applied to an email, the one or more graph neural networks outputs a label for the email as a phishing email or a non-phishing email, wherein:
noise is added during the stochastic training of the one or more graph neural networks, such that the trained one or more graph neural networks are more robust against small changes in the training data;
receiving a testing email having HTML tags; and
applying the one or more trained graph neural networks to the testing email and outputting the label for the testing email as a phishing email or a non-phishing email.