US 12,034,732 B2
	System, method, and computer program for automatically classifying user accounts in a computer network using keys from an identity management system
Derek Lin, San Mateo, CA (US); Barry Steiman, San Ramon, CA (US); Domingo Mihovilovic, Menlo Park, CA (US); and Sylvain Gil, San Francisco, CA (US)
Assigned to Exabeam, Inc., Foster City, CA (US)
Filed by Exabeam, Inc., Foster City, CA (US)
Filed on Sep. 17, 2021, as Appl. No. 17/478,805.
Application 17/478,805 is a continuation of application No. 15/058,034, filed on Mar. 1, 2016, granted, now 11,140,167.
Prior Publication US 2022/0006814 A1, Jan. 6, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 20/00 (2019.01); H04L 9/40 (2022.01)

CPC H04L 63/102 (2013.01) [G06N 20/00 (2019.01)]

15 Claims

1. A method, performed by a computer system, for automatically classifying user accounts in an entity's IT network, wherein the user accounts are classified using identity management key-value pairs from an identity management data structure, the method comprising:

training a statistical model to map individual identity management key-value pairs or sets of identity management key-value pairs to a probability of being associated with a service user account, wherein a key in the identity management key-value pair is a textual string that represents a field in a directory, maintained by an identity management system, comprising one or more accounts on the entity's IT network, wherein a value in the identity management key-value pair is a corresponding value to the field in the directory, and wherein the statistical model is trained using a set of inputs and a target variable and wherein training the model comprises:

parsing account data from an output text file stored in or hosted on the identity management system associated with user accounts manually classified as the service user accounts or human user accounts to obtain dynamically-specified identity management key-value pairs that are used as the inputs in the statistical model, and

setting the target variable in the statistical model to be whether the user account is a service user account;

using machine-learning-based modeling to automatically determine whether an unclassified user account is a service user account by performing the following:

identifying identity management key-value pairs, from the identity management system, associated with the unclassified user account,

representing the unclassified user account as an N-dimensional vector of the identity management key-value pairs, wherein N is the number of the identity management key-value pairs associated with the unclassified user account,

inputting the N-dimensional vector into the statistical model to calculate a probability that the unclassified user account is a service user account, and

in response to the probability exceeding a threshold, classifying the unclassified user account as a service user account;

using account classification results from the machine-learning-based modeling to construct and evaluate context-specific rules, wherein the context-specific rules identify one or more user accounts that are classified as service user account(s) but are known in the system to be human user account(s), wherein for the one or more user accounts that are classified as service user account(s) but are known in the system to be human user account(s), performing the following steps:

identifying a probability score associated with an equal error rate (EER), wherein the EER is the rate at which false positives equal false negatives,

setting the threshold to the probability score associated with the EER, and

in response to the probability exceeding the threshold, classifying the human user account(s) as service user account(s); and

using the context-specific rules to improve security analytics alert accuracy in an IT network.