CPC G06F 16/951 (2019.01) [G06F 40/205 (2020.01); G06N 3/04 (2013.01); G06N 3/082 (2013.01)] | 20 Claims |
1. A method comprising using at least one hardware processor to:
during a training mode, train a model to predict a class, from a plurality of classes in a taxonomy of web-based activities, based on a training dataset that comprises a plurality of annotated features, wherein each of the plurality of annotated features comprises one or more features, which have been derived from a uniform resource locator (URL) of an online resource and metadata associated with that online resource, and a ground-truth class assigned to those one or more features; and,
during an operation mode,
acquire web data comprising one or more activity records, wherein each of the one or more activity records comprises a URL of an online resource that was accessed by a visitor, and metadata associated with that online resource, and,
for each of the one or more activity records,
extract a set of one or more features from the URL and the metadata in the activity record,
apply the trained model to the set of one or more features to predict a class, from the plurality of classes in the taxonomy, that is associated with the set of one or more features, and
store the predicted class in association with the URL in the activity record.
|