US 12,438,912 B2
Phishing URL detection using transformers
Jack Wilson Stokes, III, North Bend, WA (US); Pranav Ravindra Maneriker, Columbus, OH (US); Arunkumar Gururajan, Sammamish, WA (US); Diana Anca Carutasu, Bellevue, WA (US); and Edir Vinicio Garcia Lazo, Seattle, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on May 9, 2024, as Appl. No. 18/660,104.
Application 18/660,104 is a continuation of application No. 17/246,352, filed on Apr. 30, 2021, granted, now 12,003,535.
Claims priority of provisional application 63/155,157, filed on Mar. 1, 2021.
Prior Publication US 2024/0297900 A1, Sep. 5, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 9/40 (2022.01); G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)
CPC H04L 63/1483 (2013.01) [G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, causes the one or more processors to perform a method comprising:
generating a set of Uniform Resource Locator (URL) tokens based on a URL;
generating a set of metadata tokens based on metadata associated with the URL;
generating a set of feature tokens based on the set of URL tokens, the set of metadata tokens, and a set of separator tokens by at least concatenating the set of URL tokens and the set of metadata tokens into the set of feature tokens including a first separator token from the set of separator tokens between a first metadata token of the set of metadata tokens and a second metadata token, wherein the first separator token indicates a type of metadata associated with the second metadata token;
providing the set of feature tokens as a single input vector to a transformer model;
obtaining an output of the transformer model including an embedding vector;
determining a decision statistic based on the embedding vector; and
as a result of the decision statistic indicating the URL is malicious, causing a remedial action to be performed, where the remedial action prevents a computing device from accessing the URL.