US 12,455,909 B2
Method and system for protecting and removing private information used in large language models
Vijay Madisetti, Alpharetta, GA (US); and Arshdeep Bahga, Chandigarh (IN)
Assigned to Vijay Madisetti, Alpharetta, GA (US)
Filed by Vijay Madisetti, Alpharetta, GA (US)
Filed on Feb. 25, 2025, as Appl. No. 19/062,123.
Application 19/062,123 is a continuation of application No. 18/744,199, filed on Jun. 14, 2024, granted, now 12,306,859.
Application 18/744,199 is a continuation in part of application No. 18/406,906, filed on Jan. 8, 2024, granted, now 12,158,904, issued on Dec. 3, 2024.
Application 18/406,906 is a continuation in part of application No. 18/470,487, filed on Sep. 20, 2023, granted, now 12,147,461, issued on Nov. 19, 2024.
Application 18/470,487 is a continuation of application No. 18/348,692, filed on Jul. 7, 2023, granted, now 12,001,462, issued on Jun. 4, 2024.
Claims priority of provisional application 63/551,548, filed on Feb. 9, 2024.
Claims priority of provisional application 63/604,909, filed on Dec. 1, 2023.
Claims priority of provisional application 63/604,910, filed on Dec. 1, 2023.
Claims priority of provisional application 63/602,675, filed on Nov. 27, 2023.
Claims priority of provisional application 63/469,571, filed on May 30, 2023.
Claims priority of provisional application 63/463,913, filed on May 4, 2023.
Prior Publication US 2025/0217394 A1, Jul. 3, 2025
Int. Cl. G06F 16/3329 (2025.01); G06F 40/284 (2020.01)
CPC G06F 16/3329 (2019.01) [G06F 40/284 (2020.01)] 27 Claims
OG exemplary drawing
 
1. A method for generating adversarial data for use in a large language model (LLM) comprising:
receiving an input condition comprising a plurality of authentic data records;
extracting by a personally identifiable information (PII) extraction module an extracted authentic data record from the input condition, the extracted authentic data record comprising a plurality of data fields;
assigning a utility score to each data field of the plurality of data fields by a generator neural network using a utility function;
one of receiving an adversarial loss term indicating a target divergence between the extracted authentic data record and a synthetic data record and generating the adversarial loss term; and
generating the synthetic data record to be comprised by a plurality of synthetic data records and to comprise a plurality of synthetic data fields based on the input condition, the utility score of each data field of the plurality of data fields, and the adversarial loss term, each synthetic data field of the plurality of synthetic data fields being associated with a data field of the plurality of data fields.