CPC H04L 63/1483 (2013.01) [G06F 40/284 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] | 20 Claims |
1. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, causes the one or more processors to perform a method for detecting a malicious uniform resource locator (URL) the method comprising:
tokenizing a URL into URL tokens;
tokenizing metadata associated with the URL into metadata tokens;
forming a token encoding from the URL tokens and the metadata tokens by generating a joint Byte Pair Encoding (BPE) that combines the URL tokens and the metadata tokens;
inputting the token encoding into a transformer model;
in response to inputting the token encoding into the transformer model, receiving an embedding vector from the transformer model;
calculating a decision statistic from the embedding vector; and
based on the decision statistic indicating the URL is malicious, taking a remedial action that limits access to a resource associated with the URL.
|