US 11,727,109 B2
Identifying adversarial attacks with advanced subset scanning
Skyler Speakman, Nairobi (KE); Celia Cintas, Nairobi (KE); Victor Abayomi Akinwande, Nairobi (KE); Srihari Sridharan, Nairobi (KE); and Komminist Weldemariam, Ottawa (CA)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jan. 30, 2020, as Appl. No. 16/776,554.
Claims priority of provisional application 62/965,309, filed on Jan. 24, 2020.
Prior Publication US 2021/0232931 A1, Jul. 29, 2021
Int. Cl. G06F 21/52 (2013.01); G06T 19/00 (2011.01); G06N 3/047 (2023.01); G06N 3/02 (2006.01); G06N 3/088 (2023.01); G06N 20/00 (2019.01); G06F 18/23 (2023.01); G06F 18/2411 (2023.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01)
CPC G06F 21/52 (2013.01) [G06F 18/23 (2023.01); G06F 18/2411 (2023.01); G06N 3/047 (2023.01); G06N 3/088 (2013.01); G06N 20/00 (2019.01); G06T 19/006 (2013.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06N 3/02 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for detecting whether a given item within input data is anomalous, the method comprising steps of:
extracting activations at respective nodes of an autoencoder trained on the data, the activations comprising activations at the respective nodes for the given item within the data and for one or more other items within the data;
calculating p-values corresponding to the respective nodes, wherein the p-value for a given node represents a proportion of the activations at the given node for the other items which are greater than the activations at the given node for the given item;
determining at least one score at least in part by scanning for a subset of the respective nodes which maximizes a statistical scoring function applied to the corresponding p-values, wherein determining the at least one score comprises determining a respective score for each inner layer of the autoencoder; and
providing a visualization of at least the given item within the input data to a human user, wherein the visualization comprises unsupervised clustering in latent representation space of the autoencoder and the visualization further comprises one or more tailored mitigation actions suggested when a risk score exceeds a threshold, the risk score exceeding the threshold indicating that the given item is anomalous and corresponds to an adversarial attack.