| CPC G06N 3/08 (2013.01) [G06F 16/285 (2019.01); G06N 3/04 (2013.01)] | 18 Claims |

|
1. A method, comprising:
receiving, by a computer system, raw data containing sample domains, each of which has a known class identity, the known class identity indicating whether a domain is conducting an email campaign;
extracting, by the computer system, features from each of the sample domains;
selecting, by the computer system, features of interest from the features, the features of interest including at least a feature particular to a seed domain and features particular to email activities over a time line;
creating, by the computer system, feature vectors from the features of interest;
training, by the computer system, a machine learning model using the feature vectors, the training including optimizing a neural network structure iteratively until stopping criteria are satisfied, wherein the neural network structure is iteratively optimized using a local search procedure based on minimizing a false positive rate and maximizing a classification accuracy rate of an intermediate trained model during a verification process and wherein when the stopping criteria are met, the machine learning model is built and trained as an email campaign classifier; and
classifying, by the computer system, candidate domains with unknown class identities utilizing the machine learning model thus trained such that each of the candidate domain is classified as conducting or not conducting an email campaign.
|