US 11,907,658 B2
User-agent anomaly detection using sentence embedding
Zhe Chen, Singapore (SG); Hewen Wang, Singapore (SG); Yuzhen Zhuo, Singapore (SG); Solomon kok how Teo, Singapore (SG); Shanshan Peng, Singapore (SG); Quan Jin Ferdinand Tang, Singapore (SG); Serafin Trujillo, Singapore (SG); Kenneth Bradley Snyder, San Jose, CA (US); Mandar Ganaba Gaonkar, San Jose, CA (US); and Omkumar Mahalingam, Santa Clara, CA (US)
Assigned to PayPal, Inc., San Jose, CA (US)
Filed by PayPal, Inc., San Jose, CA (US)
Filed on May 5, 2021, as Appl. No. 17/308,931.
Prior Publication US 2022/0358289 A1, Nov. 10, 2022
Int. Cl. G06F 40/279 (2020.01); G06F 21/55 (2013.01); G06F 21/56 (2013.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06F 18/2321 (2023.01); G06F 18/2415 (2023.01)
CPC G06F 40/284 (2020.01) [G06F 18/2321 (2023.01); G06F 18/2415 (2023.01); G06F 40/30 (2020.01)] 20 Claims
OG exemplary drawing
 
8. A method comprising:
receiving, by a computer system, a request from a user-agent application of a device to access at least one resource associated with a service provider system;
extracting, from the request, an identifier of the user-agent application, wherein the identifier comprises a character string;
generating, from the character string, a plurality of character n-grams based on a plurality of word sizes, wherein the plurality of character n-grams comprises a first set of character n-grams corresponding to a first word size and a second set of character n-grams corresponding to a second word size;
determining a plurality of hash values based on performing one or more hash functions on the plurality of character n-grams;
converting, by the computer system, the character string into a numerical data vector representation of the user-agent application based on the plurality of hash values, wherein the converting comprises transforming each of the plurality of hash values into a numerical value within the numerical data vector representation of the user-agent application;
calculating, by the computer system and for the user-agent application, a predictive score based on the numerical data vector representation, wherein the predictive score indicates whether the identifier of the user-agent application corresponds to an anomaly based on a probability distribution function that models patterns learned from historic data associated with a plurality of user-agent applications that have requested access to the at least one resource associated with the service provider system;
comparing, by the computer system, the predictive score to a threshold; and
based on the comparing, classifying, by the computer system, the user-agent application as non-fraudulent or fraudulent.