US 12,387,007 B2
Personally identifiable information scrubber with language models
John V. Monaco, Foster City, CA (US)
Assigned to OpenAI OpCo, LLC, San Francisco, CA (US)
Filed by OpenAI Opco, LLC, San Francisco, CA (US)
Filed on Dec. 16, 2024, as Appl. No. 18/981,911.
Claims priority of provisional application 63/610,586, filed on Dec. 15, 2023.
Prior Publication US 2025/0200222 A1, Jun. 19, 2025
Int. Cl. G06F 21/62 (2013.01); G06F 40/284 (2020.01)
CPC G06F 21/6254 (2013.01) [G06F 40/284 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A system for text sanitization, the system comprising:
one or more processors; and
one or more storage devices storing instructions that, when executed by the one or more processors, cause the system to perform operations comprising:
obtaining a first language model, training data including personally identifiable information (PII), and ground truth labels corresponding to the training data, the ground truth labels identifying PII included in the training data;
training the first language model to sanitize data using the training data and the ground truth labels;
training a second language model, using the first language model, to sanitize data, the second language model being less resource-intensive than the first language model;
obtaining input data including public PII and private PII; and
generating sanitized output data using the second language model and the input data.