US 11,893,111 B2
Defending machine learning systems from adversarial attacks
Srinivas Kruthiveti Subrahmanyeswara Sai, Bangalore (IN); Aashish Kumar, Bangalore (IN); Alexander Kreines, Jerusalem (IL); George Jose, Bengaluru (IN); Sambuddha Saha, Burdwan (IN); Nir Morgulis, Petah Tikwa (IL); and Shachar Mendelowitz, Tel Aviv (IL)
Assigned to Harman International Industries, Incorporated, Stamford, CT (US)
Filed by HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED, Stamford, CT (US)
Filed on Nov. 26, 2019, as Appl. No. 16/696,144.
Prior Publication US 2021/0157912 A1, May 27, 2021
Int. Cl. G06F 21/55 (2013.01); G06N 20/00 (2019.01); G06N 3/04 (2023.01)
CPC G06F 21/554 (2013.01) [G06N 3/04 (2013.01); G06N 20/00 (2019.01); G06F 2221/034 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for detecting adversarial attacks on a machine-learning (ML) system, the method comprising:
receiving by an ML model of the ML system input data;
processing by the ML model the input data to generate output data;
receiving by an adversarial detection module of the ML system both the input data and the output data;
inputting a perturbed input data and the output data into a neural fingerprinting model included in the adversarial detection module, wherein the perturbed input data is generated by introducing a set of predefined random perturbations into the input data;
generating by the neural fingerprinting model a perturbed output data based on the perturbed input data;
determining using the neural fingerprinting model an adversarial score indicating whether the perturbed output data matches an expected perturbed output data for a class of data associated with the input data and the output data; and
performing one or more remedial actions based on the adversarial score.