US 11,924,246 B2
	Uniform resource locator classifier and visual comparison platform for malicious site detection preliminary
Brian Sanford Jones, Cary, NC (US); Zachary Mitchell Abzug, Durham, NC (US); Jeremy Thomas Jordan, Raleigh, NC (US); Giorgi Kvernadze, Salt Lake City, UT (US); and Dallan Quass, Lindon, UT (US)
Assigned to Proofpoint, Inc., Sunnyvale, CA (US)
Filed by Proofpoint, Inc., Sunnyvale, CA (US)
Filed on Feb. 1, 2023, as Appl. No. 18/104,487.
Application 18/104,487 is a continuation of application No. 16/830,923, filed on Mar. 26, 2020, granted, now 11,609,989.
Claims priority of provisional application 62/823,733, filed on Mar. 26, 2019.
Prior Publication US 2023/0188566 A1, Jun. 15, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 9/40 (2022.01); G06F 16/51 (2019.01); G06F 16/955 (2019.01); G06F 18/21 (2023.01); G06F 18/213 (2023.01); G06F 21/56 (2013.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01)

CPC H04L 63/1483 (2013.01) [G06F 16/51 (2019.01); G06F 16/9566 (2019.01); G06F 18/213 (2023.01); G06F 18/217 (2023.01); G06F 21/56 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); H04L 63/1408 (2013.01); H04L 63/1416 (2013.01); H04L 63/1441 (2013.01); G06V 2201/09 (2022.01)]

20 Claims

1. A computing platform, comprising:

at least one processor;

a communication interface communicatively coupled to the at least one processor; and

memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:

identify, by parsing a uniform resource locator (URL), one or more human-engineered features of a URL, wherein the one or more human-engineered features comprise one or more of: a protocol, a top level domain (TLD), a domain, a subdomain, a port, a port type, a path, or path components;

identify one or more deep learned features of the URL;

concatenate the one or more human-engineered features of the URL to the one or more deep learned features of the URL, resulting in a concatenated vector representation;

compute, by inputting the concatenated vector representation of the URL to a URL classifier, a first phish classification score;

in response to determining that the first phish classification score is between a first phish classification threshold and a second phish classification threshold, cause image data for the URL to be sent to a visual similarity classification platform, configured to produce a computer vision vector representation of the image data and a corresponding second phish classification score; and

in response to determining that the second phish classification score exceeds the first phish classification threshold, cause a cybersecurity server to perform a first action.