CPC H04L 63/1466 (2013.01) [G06F 18/217 (2023.01); G06F 18/2113 (2023.01); G06N 20/00 (2019.01); H04L 63/1441 (2013.01)] | 20 Claims |
1. A computer-implemented method for provenance-based defense against poison attacks, the method comprising:
receiving one or more observations from one or more data sources, wherein each observation comprises one or more features for training a final prediction model;
receiving provenance data corresponding to each observation;
determining whether some or all of the observations are poisoned based at least in part on the corresponding provenance data; and
in response to determining some or all of the observations are poisoned, removing the poisoned observation(s) from a final training dataset used to train the final prediction model, and
wherein determining whether each observation is poisoned comprises:
determining a provenance feature for the provenance data corresponding to each of the observations;
grouping observations characterized by a same provenance signature of the determined provenance feature;
generating a filtered training dataset excluding one or more of the groups of observations from the training dataset; and
training a first prediction model corresponding to the final prediction model using the filtered training dataset.
|