| CPC G06F 21/6245 (2013.01) [G06F 18/2132 (2023.01); G06N 20/00 (2019.01); G06V 20/62 (2022.01); G06V 30/274 (2022.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
generating, based on a plurality of confidential data in a document, text embeddings;
inputting the text embeddings into a machine learning model to generate a plurality of synthetic images, wherein each of the plurality of synthetic images corresponds to one of the plurality of confidential data;
receiving, from a first computing device of a data labeler, a label for each of the plurality of synthetic images; and
sending, to a second computing device, the plurality of confidential data and the label for each of the plurality of synthetic images corresponding to the plurality of confidential data.
|