US 12,271,491 B2
Detection and mitigation of machine learning model adversarial attacks
William Franklin Cameron, Jacksonville, FL (US); Pramod Goyal, Ahmedabad (IN); Prithvi Narayana Rao, Allen, TX (US); Manjit Rajaretnam, Irving, TX (US); and Miriam Silver, Tel Aviv (IL)
Assigned to CITIBANK, N.A.
Filed by Citibank, N.A., New York, NY (US)
Filed on Oct. 22, 2024, as Appl. No. 18/923,655.
Application 18/923,655 is a continuation in part of application No. 18/792,523, filed on Aug. 1, 2024.
Application 18/792,523 is a continuation in part of application No. 18/607,141, filed on Mar. 15, 2024.
Application 18/607,141 is a continuation in part of application No. 18/399,422, filed on Dec. 28, 2023.
Application 18/399,422 is a continuation of application No. 18/327,040, filed on May 31, 2023, granted, now 11,874,934, issued on Jan. 16, 2024.
Application 18/327,040 is a continuation in part of application No. 18/114,194, filed on Feb. 24, 2023, granted, now 11,763,006, issued on Sep. 19, 2023.
Application 18/114,194 is a continuation in part of application No. 18/098,895, filed on Jan. 19, 2023, granted, now 11,748,491, issued on Sep. 5, 2023.
Prior Publication US 2025/0053664 A1, Feb. 13, 2025
Int. Cl. G06F 21/57 (2013.01); G06F 21/55 (2013.01)
CPC G06F 21/577 (2013.01) [G06F 21/552 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for training a machine learning model, the method comprising:
accessing a candidate training dataset for the machine learning model;
evaluating the candidate training dataset using a first filter layer, wherein evaluating the candidate training dataset using the first filter layer comprises:
applying at least one provenance filter to verify a provenance of the candidate training dataset by verifying at least one of: a signature of the candidate training dataset, a hash of the candidate training dataset, or a secure sockets layer/transport layer security (SSL/TLS) certificate of a source of the candidate training dataset; and
when the provenance of the candidate training dataset is verified:
evaluating the candidate training dataset using a second filter layer, wherein evaluating the candidate training dataset using the second filter layer comprises:
determining at least one content filter to apply to the candidate training dataset, wherein the at least one content filter is configured to determine whether the candidate training dataset contains poisoned data;
applying the at least one content filter to the candidate training dataset;
determining, based on applying the at least one content filter to the candidate training dataset, an integrity level of the candidate training dataset;
determining whether the integrity level satisfies a threshold integrity value; and
when the integrity level satisfies the threshold integrity value, training the machine learning model using the candidate training dataset.