US 12,003,535 B2
Phishing URL detection using transformers
Jack Wilson Stokes, III, North Bend, WA (US); Pranav Ravindra Maneriker, Columbus, OH (US); Arunkumar Gururajan, Sammamish, WA (US); Diana Anca Carutasu, Bellevue, WA (US); and Edir Vinicio Garcia Lazo, Seattle, WA (US)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on Apr. 30, 2021, as Appl. No. 17/246,352.
Claims priority of provisional application 63/155,157, filed on Mar. 1, 2021.
Prior Publication US 2022/0279014 A1, Sep. 1, 2022
Int. Cl. H04L 29/06 (2006.01); G06F 40/284 (2020.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); H04L 9/40 (2022.01)
CPC H04L 63/1483 (2013.01) [G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, causes the one or more processors to perform a method for detecting a malicious uniform resource locator (URL) the method comprising:
tokenizing a URL into URL tokens;
tokenizing metadata associated with the URL into metadata tokens;
forming a token encoding from the URL tokens and the metadata tokens by generating a joint Byte Pair Encoding (BPE) that combines the URL tokens and the metadata tokens;
inputting the token encoding into a transformer model;
in response to inputting the token encoding into the transformer model, receiving an embedding vector from the transformer model;
calculating a decision statistic from the embedding vector; and
based on the decision statistic indicating the URL is malicious, taking a remedial action that limits access to a resource associated with the URL.