CPC G06F 21/566 (2013.01) [G06N 5/045 (2013.01); G06N 20/00 (2019.01); G06F 2221/033 (2013.01)] | 20 Claims |
1. A computer-implemented method for detecting backdoor poisoning of a machine-learned decision-making system (MLDMS), comprising:
receiving the MLDMS, wherein the MLDMS operates on input data samples to produce an output decision that leverages a set of parameters that are learned from a training dataset that may be backdoor-poisoned;
receiving a set of clean (unpoisoned) data samples that are mapped by the MLDMS to a plurality of output values;
using the MLDMS and the clean data samples, estimating a set of potential backdoor perturbations such that incorporating a potential backdoor perturbation into a subset of the clean data samples induces an output decision change;
comparing the set of potential backdoor perturbations to determine a candidate backdoor perturbation based on at least one of perturbation sizes and corresponding output changes; and
using the candidate backdoor perturbation to determine whether the MLDMS has been backdoor-poisoned.
|