US 12,323,462 B2
Identifying legitimate websites to remove false positives from domain discovery analysis
Hung-Jen Chang, Freemont, CA (US); Gaurav Mitesh Dalal, Fremont, CA (US); and Ali Mesdaq, San Jose, CA (US)
Assigned to Proofpoint, Inc., Sunnyvale, CA (US)
Filed by Proofpoint, Inc., Sunnyvale, CA (US)
Filed on Mar. 5, 2024, as Appl. No. 18/595,625.
Application 18/595,625 is a continuation of application No. 17/992,180, filed on Nov. 22, 2022, granted, now 11,956,272.
Application 17/992,180 is a continuation of application No. 16/718,809, filed on Dec. 18, 2019, granted, now 11,539,745, issued on Dec. 27, 2022.
Claims priority of provisional application 62/822,264, filed on Mar. 22, 2019.
Prior Publication US 2025/0016198 A1, Jan. 9, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 9/40 (2022.01); G06F 16/957 (2019.01); G06F 40/205 (2020.01); G06N 20/00 (2019.01); H04L 61/4511 (2022.01)
CPC H04L 63/1483 (2013.01) [G06F 16/9577 (2019.01); G06F 40/205 (2020.01); G06N 20/00 (2019.01); H04L 61/4511 (2022.05); H04L 63/1416 (2013.01); H04L 63/1425 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing platform, comprising:
at least one processor;
a communication interface communicatively coupled to the at least one processor; and
memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:
determine feature vectors corresponding to a tag structure of one or more pages associated with a first domain;
compare the feature vectors corresponding to the tag structure to the feature vectors corresponding to known legitimate domains of a baseline dataset, resulting in one or more structure analysis values comprising averages of top-N similarity scores for a plurality of selected N values; and
based on determining that the one or more structure analysis values exceed one or more predetermined structure analysis threshold values, send one or more commands directing a domain identification system to remove the first domain from a list of indeterminate domains maintained by the domain identification system.