US 12,346,801 B2
	Multi-level ensemble classifiers for cybersecurity machine learning applications
John W. Conwell, Tukwila, WA (US); and Sean M. McNee, Seattle, WA (US)
Assigned to Domain Tools Holdings, LLC, Lewes, DE (US)
Filed by DomainTools, LLC, Seattle, WA (US)
Filed on Nov. 9, 2020, as Appl. No. 17/093,606.
Prior Publication US 2022/0147815 A1, May 12, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/045 (2023.01); H04L 9/40 (2022.01)

CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01); H04L 63/1416 (2013.01)]

27 Claims

1. A method in a computing system, comprising:

determining from a plurality of trained machine learning models stored in a repository, a plurality of model subsets, each model subset including a plurality of model instances sharing a same machine learning algorithm, same model tuning parameters, and same feature vector values as input data, and each of the plurality of model instances of a same model subset trained using a different sampling of training data, wherein each feature vector for each model subset encodes and transforms a subset of feature classes of interest to a respective model subset, and wherein at least two of the plurality of model subsets have different feature vectors;

for each of a plurality of model subsets,

under control of the computing system automatically executing each of the plurality of model instances of each model subset to produce a model output and combining the plurality of model outputs to produce a single model subset output;

forwarding the single model subset output to a plurality of ensemble master classifiers;

under control of at least one of the plurality of ensemble master classifiers, automatically,

receiving as input the single model subset output from at least two different model subsets, wherein each of the single model subset outputs result from processing different internet infrastructure data associated with different feature classes of interest;

initializing weights to correspond to each of the single model subset outputs from the at least two different model subsets thereby defining amount of participation of each of the model subsets participating in the at least one of the plurality of ensemble master classifiers;

performing a machine learning classification on the received input according to current weights associated with each of the single model subsets to generate a score;

optimizing the results of the performed classification by adjusting the weights and re-performing the machine learning classification to regenerate a score until a determined optimization value is satisfied; and

forwarding the score after the optimization value is satisfied as the final score of the ensemble master classifier; and

deploying the resultant optimized at least one of the plurality of ensemble master classifiers as a cybersecurity threat analyzer such that, when the deployed cybersecurity threat analyzer is applied to an input of an unknown domain that is not yet known to be malicious, the cybersecurity threat analyzer automatically predicts and indicates a metric as the final score, without human intervention in the prediction process, whether the unknown domain is malicious.