CPC G06N 20/00 (2019.01) [G06F 16/906 (2019.01); G06F 21/50 (2013.01)] | 20 Claims |
1. A computer-implemented method comprising:
selecting a feature merging threshold (α), from a set of candidate α values, the set comprising multiple α values, and the feature merging threshold α being for determining equivalence between two features, wherein the selecting considers all of the multiple α values together in training respective model whitelists for the multiple α values, and wherein the selecting comprises:
partitioning training data into a plurality of groups;
establishing a respective model Wα for each α value of the set of candidate α values, the establishing producing multiple model Wα, each corresponding to an α value of the multiple α values;
iteratively performing, using α training set:
selecting a next group of training data of the plurality of groups of training data;
adding the selected next group of training data to the training set;
for each α value in the set of candidate α values:
training the Wα for the α value using the training set with the added selected next group of training data, wherein the training comprises (i) monitoring, by hooking machine instructions executing on a system, function calls invoked by an application based on the application opening and rendering documents of the training set, and (i) merging features, determined from the monitoring, according to the α value, to produce the trained Wα for the α value; and
evaluating a size of Wα, the size comprising a number of features included in the model after the training the Wα for the α value using the training set with the added selected next group of training data;
wherein whether to continue the iteratively performing is based at least in part on the evaluated size of every Wα for the set of candidate α values as a result of the training; and
choosing the feature merging threshold α based on the iteratively performing.
|