US 12,328,331 B1
	Detection of privacy attacks on machine learning models
Hengrui Jia, Toronto (CA); Ravikumar Balakrishnan, Beaverton, OR (US); Zeliang Kan, London (GB); and Jason Martin, Beaverton, OR (US)
Assigned to HiddenLayer, Inc., Austin, TX (US)
Filed by HiddenLayer, Inc., Austin, TX (US)
Filed on Feb. 4, 2025, as Appl. No. 19/045,368.
Int. Cl. H04L 9/40 (2022.01)

CPC H04L 63/1425 (2013.01) [H04L 63/1416 (2013.01); H04L 63/1441 (2013.01)]

29 Claims

1. A computer-implemented method comprising:

inputting each of a plurality of queries into an artificial intelligence (AI) model comprising a plurality of layers including an input layer, an output layer, and at least one intermediate layer between the input layer and the output layer, each intermediate layer, during inference, outputting a plurality of activations;

intercepting, for each query, activations from at least one of the intermediate layers;

determining whether a distribution of the intercepted activations across the queries indicates that the queries seek to cause the AI model to behave in an undesired manner by conducting a distance-based similarity analysis between the intercepted activations and reference activations, the distance-based similarity analysis comprising:

generating a first metric which characterizes overall divergence between the intercepted activations and the reference activations; and

generating a second metric which groups queries by class predictions and calculates a distance for each group relative to reference activations of a same class; and

providing data characterizing the determining to a consuming application or process.