US 12,462,018 B2
System and method for detecting poisoned training data based on characteristics of updated artificial intelligence models
Ofir Ezrielev, Be'er Sheva (IL); Amihai Savir, Newton, MA (US); and Tomer Kushnir, Omer (IL)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Dec. 29, 2022, as Appl. No. 18/147,773.
Prior Publication US 2024/0220608 A1, Jul. 4, 2024
Int. Cl. G06F 21/55 (2013.01); G06N 3/096 (2023.01)
CPC G06F 21/55 (2013.01) [G06N 3/096 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method for managing an artificial intelligence (AI) model, the method comprising:
obtaining a second instance of the AI model, the second instance of the AI model comprising a first portion being trained using a known good set of training data and a second portion being trained using a suspect set of training data, and obtaining the second instance of the AI model comprises:
performing a transfer learning process using the suspect set of training data and a first instance of the AI model to obtain the second instance of the AI model, the first instance of the AI model previously being trained, at least in part, using the known good set of training data, and the transfer learning process comprises:
obtaining the first instance of the AI model;
freezing a first portion of the first instance of the AI model to obtain a partially frozen AI model; and
training the partially frozen AI model using the suspect set of training data to obtain the second instance of the AI model;
performing an analysis of the second instance of the AI model to obtain a quantification, the quantification indicating a likelihood that the suspect set of training data comprises poisoned training data;
making a determination regarding whether the quantification exceeds a quantification threshold;
in a first instance of the determination in which the quantification exceeds the quantification threshold:
treating the suspect set of training data as comprising the poisoned training data; and
in a second instance of the determination in which the quantification does not exceed the quantification threshold:
treating the suspect set of training data as not comprising the poisoned training data.