US 12,001,548 B2
	Threat detection using machine learning query analysis
Liron Ben Kimon, Tel Aviv (IL); and Yuri Shafet, Tel Aviv (IL)
Assigned to PAYPAL, INC.
Filed by PAYPAL, INC., San Jose, CA (US)
Filed on Jun. 25, 2019, as Appl. No. 16/451,170.
Prior Publication US 2020/0410091 A1, Dec. 31, 2020
Int. Cl. H04L 29/06 (2006.01); G06F 16/245 (2019.01); G06F 21/55 (2013.01); G06N 20/00 (2019.01)

CPC G06F 21/552 (2013.01) [G06F 16/245 (2019.01); G06F 21/554 (2013.01); G06N 20/00 (2019.01); G06F 2221/034 (2013.01)]

20 Claims

1. A method, comprising:

accessing a first plurality of user-generated database queries executed by a plurality of different users on one or more databases;

extracting, from each user-generated database query in the first plurality of user-generated database queries, a corresponding set of features representing different characteristics of the user-generated database query, wherein the different characteristics comprise at least one of a length of the user-generated database query, a format of the user-generated database query, or a styling of the user-generated database query;

creating a set of artificial intelligence (AI) training data based on the corresponding sets of features;

training a machine learning (ML) classifier using the set of AI training data and corresponding labels for each of the first plurality of user-generated database queries, wherein each of the corresponding labels indicates an identity of one of the plurality of different users that is associated with a corresponding query, and wherein the trained ML classifier is configured to produce vector outputs in a vector space in response to receiving database queries;

extracting, by a computer system from a first user-generated database query associated with a first user, a first set of features representing the different characteristics of the first user-generated database query;

obtaining a first output vector in the vector space based on providing the first set of features to the trained ML classifier; and

based on the first output vector, determining, by the computer system, if the first user-generated database query represents a data access anomaly.