US 12,033,617 B2
Adversarial language imitation with constrained exemplars
Hamid Palangi, Vancouver (CA); Saadia Kai Gabriel, Redmond, WA (US); Thomas Hartvigsen, Redmond, WA (US); Dipankar Ray, Redmond, WA (US); and Semiha Ece Kamar Eden, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Feb. 17, 2022, as Appl. No. 17/674,044.
Prior Publication US 2023/0260506 A1, Aug. 17, 2023
Int. Cl. G10L 15/08 (2006.01)
CPC G10L 15/08 (2013.01) [G06F 2218/20 (2023.01); G10L 2015/081 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for generating a phrase that is confusing for a language classifier (LC), the method comprising:
determining, by the LC, a first classification score (CS) of a prompt indicating whether the prompt is a first class or a second class;
predicting, based on the prompt and by a pre-trained language model (PLM), likely next words and a corresponding probability for each of the likely next words;
determining, by the LC, a second CS for each of the likely next words;
determining, by an adversarial classifier, respective scores for each of the likely next words, the respective scores determined based on the first CS of the prompt, the second CS of the likely next words, and the probabilities of the likely next words; and
selecting, an adversarial classifier, a likely next word of the likely next words based on the respective scores.