CPC G06F 21/577 (2013.01) [G06F 16/338 (2019.01); G06F 16/355 (2019.01); G06F 16/36 (2019.01)] | 20 Claims |
1. A method, comprising:
sending, by a computer, a query containing keywords to a content database;
receiving, by the computer, unstructured content items from the content database, each of the unstructured content items containing the keywords;
determining, by the computer, topics based on the unstructured content items received from the content database;
clustering, by a computer based at least on the topics, the unstructured content items into a plurality of clusters;
determining, by the computer, a cluster of interest from the plurality of clusters based at least in part on a percentage of vetted cybersecurity phrases in each of the unstructured content items received from the content database, wherein the cluster of interest has a higher percentage of the vetted cybersecurity phrases than the rest of the plurality of clusters;
determining, by the computer, whether the cluster of interest meets or exceeds a predetermined threshold representing a desired level of precision in terms of a number of unstructured content items in the cluster of interest as measured against a number of unstructured content items received from the content database; and
responsive to the cluster of interest meeting or exceeding the predetermined threshold, storing, by the computer, the query as a classifier rule in a rules database such that a cybersecurity classifier is able to classify unstructured social content using the classifier rule generated based on the query.
|