US 12,190,888 B2
System and method for secure training of speech processing systems
Shou-Chun Yin, Brossard (CA); Junho Park, Bedford, MA (US); Dushyant Sharma, Tracy, CA (US); DoYeong Kim, Lexington, MA (US); and Francesco Nespoli, London (GB)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Nuance Communications, Inc., Burlington, MA (US)
Filed on Jun. 15, 2022, as Appl. No. 17/840,795.
Prior Publication US 2023/0410814 A1, Dec. 21, 2023
Int. Cl. G10L 15/26 (2006.01); G10L 13/04 (2013.01); G10L 15/06 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 13/04 (2013.01); G10L 15/063 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, executed on a computing device, comprising:
generating an obscured speech signal from an input speech signal and an obscured transcription from a transcription of the input speech signal, wherein the obscured speech signal and the obscured transcription include obscured representations of sensitive content from the input speech signal and the transcription of the input speech signal;
extracting a speaker embedding from the input speech signal;
generating a speaker embedding delta based upon, at least in part, the extracted speaker embedding and a synthetic speaker embedding;
generating a synthetic speech signal from the obscured speech signal using the synthetic speaker embedding;
generating a residual signal based upon, at least in part, the obscured speech signal and the speaker embedding delta; and
training a speech processing system using the obscured transcription, the synthetic speech signal, the speaker embedding delta, and the residual signal.