| CPC G06F 21/6254 (2013.01) [G06F 40/284 (2020.01)] | 20 Claims |

|
1. A system for text sanitization, the system comprising:
one or more processors; and
one or more storage devices storing instructions that, when executed by the one or more processors, cause the system to perform operations comprising:
obtaining a first language model, training data including personally identifiable information (PII), and ground truth labels corresponding to the training data, the ground truth labels identifying PII included in the training data;
training the first language model to sanitize data using the training data and the ground truth labels;
training a second language model, using the first language model, to sanitize data, the second language model being less resource-intensive than the first language model;
obtaining input data including public PII and private PII; and
generating sanitized output data using the second language model and the input data.
|