CPC H04L 63/1483 (2013.01) [G06F 18/214 (2023.01); H04L 61/4511 (2022.05); H04L 63/0236 (2013.01); H04L 63/1416 (2013.01); H04L 63/1425 (2013.01)] | 20 Claims |
1. A system for phishing domain detection, comprising:
a memory operable to store a training dataset comprising a plurality of received communications, wherein:
at least one training communication from the plurality of received communications comprises a text message or an email message and is known to be associated with a particular phishing domain;
the at least one training communication is associated with a first set of features comprising at least two of a first time of receipt, a first sender name, a first domain name, a first message sentiment, and a first attachment file associated with the at least one training communication; and
a processor, operably coupled with the memory, and configured to:
intercept a live communication that is intended to be received by a computing device before the live communication is received by the computing device, wherein the live communication comprises a text message or an email message and that is associated with a particular domain;
extract, by a natural language processing (NLP) algorithm, a second set of features from the live communication, wherein the second set of features comprises at least two of a second time of receipt, a second sender name, a second domain name, a second message sentiment, and a second attachment file associated with the live communication;
for at least one feature from the second set of features:
compare the feature with a counterpart feature from the first set of features; and
determine whether the feature corresponds with the counterpart feature;
determine whether more than a threshold percentage of features from the second set of features corresponds with counterpart features from the first set of features;
predict whether the particular domain is the particular phishing domain in response to a determination of whether more than the threshold percentage of features from the second set of features corresponds with the counterpart features from the first set of features, wherein predicting whether the particular domain is the particular phishing domain comprises:
predicting that the particular domain is the particular phishing domain in response to determining that more than the threshold percentage of features from the second set of features corresponds with the counterpart features from the first set of features; and
predicting that the particular domain is not the particular phishing domain in response to determining that less than the threshold percentage of features from the second set of features correspond with the counterpart features from the first set of features;
receive feedback indicating whether the particular domain is the particular phishing domain;
re-train the NLP algorithm based at least in part upon the received feedback;
determine, by the re-trained NLP algorithm, whether the particular domain is the particular phishing domain;
in response to determining that the particular domain in the particular phishing domain:
prevent communication of the live communication to the computing device;
in response to determining that the particular domain is not the particular phishing domain:
forward the live communication to the computing device.
|