CPC H04L 63/1483 (2013.01) [H04L 63/0281 (2013.01); H04L 63/1408 (2013.01)] | 20 Claims |
1. A phishing classifier that classifies a universal resource locator (URL) and a content page accessed via the URL as phishing or not phishing, including:
a URL feature hasher that parses the URL into features and hashes the features to produce URL feature hashes;
a hypertext markup language (HTML) encoder, trained on HTML tokens:
extracted from content pages at example URLs,
encoded into an embedding space, then
decoded to reproduce images captured from rendering of the content pages,
wherein the trained HTML encoder produces an HTML encoding of HTML tokens extracted from the content page; and
phishing classifier layers,
trained on URL feature hashes and HTML encodings of the example URLs, each example URL accompanied by a ground truth classification as phishing or as not phishing,
wherein the phishing classifier layers process the URL feature hashes, and the HTML encoding of the URL to produce at least one likelihood score that the URL and the content page accessed via the URL presents a phishing risk.
|