US 11,818,147 B2
	Overly optimistic data patterns and learned adversarial latent features
Scott Michael Zoldi, San Diego, CA (US); and Shafi Ur Rahman, San Diego, CA (US)
Assigned to Fair Isaac Corporation, Minneapolis, MN (US)
Filed by FAIR ISAAC CORPORATION, Roseville, MN (US)
Filed on Nov. 23, 2020, as Appl. No. 17/102,295.
Prior Publication US 2022/0166782 A1, May 26, 2022
Int. Cl. H04L 9/40 (2022.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01)

CPC H04L 63/1416 (2013.01) [G06N 3/04 (2013.01); G06N 3/08 (2013.01)]

20 Claims

17. A system for improving security of a computer-implemented artificial intelligence system by detecting adversarial attacks on a machine learning decision model, the system configured for:

monitoring one or more transactions received by the machine learning decision model;

receiving a first score generated by the machine learning decision model in association with a first transaction;

identifying the first transaction as likely adversarial, in response to the first score being lower than a certain score threshold and the first transaction having a low occurrence likelihood;

receiving a second score generated by an adversary detection model in association with the first transaction based on one or more adversarial latent features associated with the first transaction as detectable by the adversary detection model;

determining at least one adversarial latent transaction feature being exploited by the first transaction, in response to determining that the second score falls above the certain score threshold;

detecting an abnormal volume of activations of the at least one adversarial latent feature spanning across a plurality of transactions scored by the adversary detection model; and

blocking transactions, received by the machine learning decision model, that match the at least one adversarial latent feature,

the first score is based on one or more features extracted from the first transaction, by a feature extraction module, and related transaction history,

the first transaction is determined to have a low occurrence likelihood based on comparing attributes of the first transaction with attributes of one or more other transactions,

the adversary detection model is implemented based on newly labeled adversarial transactions during the training phase to determine the latent features describing features relationships in an adversarial space, and the one or more extracted features for a second transaction in a training dataset are binned into ranges of values observed in the training dataset, and

a feature f is binned in ranges, f_i, indexed by i, such that a probability distribution of scores is attached to a binned value of the feature, f_i, and a calculated score distribution probability density function for the feature f is calculated based on a likelihood of observing a given score Sj for its observed value being in the feature bin i, given by score distribution probability density function:

where C_ijis a count of observations of the score bin j and feature bin i.