US 11,934,535 B2
	System and method for identifying cyberthreats from unstructured social media content
Daniel Clark Salo, Sunnyvale, CA (US)
Assigned to Proofpoint, Inc., Sunnyvale, CA (US)
Filed by PROOFPOINT, INC., Sunnyvale, CA (US)
Filed on Feb. 15, 2023, as Appl. No. 18/169,627.
Application 18/169,627 is a continuation of application No. 16/823,090, filed on Mar. 18, 2020, granted, now 11,586,739.
Claims priority of provisional application 62/955,595, filed on Dec. 31, 2019.
Prior Publication US 2023/0205894 A1, Jun. 29, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 16/338 (2019.01); G06F 16/35 (2019.01); G06F 16/36 (2019.01); G06F 21/57 (2013.01)

CPC G06F 21/577 (2013.01) [G06F 16/338 (2019.01); G06F 16/355 (2019.01); G06F 16/36 (2019.01)]

20 Claims

1. A method, comprising:

sending, by a computer, a query containing keywords to a content database;

receiving, by the computer, unstructured content items from the content database, each of the unstructured content items containing the keywords;

determining, by the computer, topics based on the unstructured content items received from the content database;

clustering, by a computer based at least on the topics, the unstructured content items into a plurality of clusters;

determining, by the computer, a cluster of interest from the plurality of clusters based at least in part on a percentage of vetted cybersecurity phrases in each of the unstructured content items received from the content database, wherein the cluster of interest has a higher percentage of the vetted cybersecurity phrases than the rest of the plurality of clusters;

determining, by the computer, whether the cluster of interest meets or exceeds a predetermined threshold representing a desired level of precision in terms of a number of unstructured content items in the cluster of interest as measured against a number of unstructured content items received from the content database; and

responsive to the cluster of interest meeting or exceeding the predetermined threshold, storing, by the computer, the query as a classifier rule in a rules database such that a cybersecurity classifier is able to classify unstructured social content using the classifier rule generated based on the query.