US 12,412,038 B2
Training language models and preserving privacy
Franck Dernoncourt, Spokane, WA (US); Tong Sun, San Ramon, CA (US); Thi Kim Phung Lai, San Jose, CA (US); Rajiv Bhawanji Jain, Falls Church, VA (US); Nikolaos Barmpalios, Sunnyvale, CA (US); and Jiuxiang Gu, Baltimore, MD (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Feb. 23, 2023, as Appl. No. 18/173,199.
Claims priority of provisional application 63/413,519, filed on Oct. 5, 2022.
Prior Publication US 2024/0135103 A1, Apr. 25, 2024
Int. Cl. G06F 40/295 (2020.01); G06F 40/274 (2020.01)
CPC G06F 40/295 (2020.01) [G06F 40/274 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, by a processing device, input data describing a sequence of words ending with a last word;
predicting, by the processing device, a next word after the last word in the sequence of words by processing the input data using a machine learning model trained on injected Gaussian noise and training data to update parameters of the machine learning model to predict next words after last words in sequences of words, the training data describing a corpus of text associated with clients and including sensitive samples and non-sensitive samples taken from databases that are client-content adjacent as differing in that a client and a sensitive entity are present in one of the client-content adjacent databases and are not present in another one of the client-content adjacent databases; and
generating, by the processing device, an indication of the next word after the last word in the sequence of words for display in a user interface.