US 12,260,866 B2
System and method for watermarking audio data for automated speech recognition (ASR) systems
Patrick Aubrey Naylor, Reading (GB); Dushyant Sharma, Mountain House, CA (US); William Francis Ganong, III, Brookline, MA (US); Uwe Helmut Jost, Groton, MA (US); and Ljubomir Milanovic, Vienna (AT)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Aug. 30, 2022, as Appl. No. 17/898,962.
Prior Publication US 2024/0071396 A1, Feb. 29, 2024
Int. Cl. G10L 19/018 (2013.01); G10L 15/22 (2006.01); G10L 25/21 (2013.01); G10L 25/51 (2013.01)
CPC G10L 19/018 (2013.01) [G10L 15/22 (2013.01); G10L 25/21 (2013.01); G10L 25/51 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A computer-implemented method, executed on a computing device, comprising:
processing audio information associated with a speech processing system;
encoding a watermark in a non-disruptive portion of the audio information during storage and/or transmission of the audio information including:
determining a pair of time frequency points with time frequency points from a same channel;
generating a watermark database with a plurality of keys mapping a plurality of watermarks to particular pairs of time frequency points for determining or identifying the particular pairs of time frequency points; and
recording a key mapping the watermark in the audio information to a relative phase difference between the pair of time frequency points from the same channel in the watermark database; and
decoding the watermark using the key from the watermark database to identify the pair of time frequency points and by determining the relative phase difference between the identified pair of time frequency points.
 
8. A computing system comprising:
a memory; and
a processor configured to process automated speech recognition (ASR) audio information for storage and/or transmission, wherein the processor is further configured to encode a watermark in phase information of the ASR audio information during the storage and/or transmission of the ASR audio information including: to determine a pair of time frequency points with time frequency points from a same channel, to generate a watermark database with a plurality of keys mapping a plurality of watermarks to particular pairs of time frequency points for determining or identifying the particular pairs of time frequency points, and to record a key mapping the watermark in the ASR audio information to a relative phase difference between the pair of time frequency points from the same channel in the watermark database, and wherein the processor is further configured to decode the watermark using the key from the watermark database to identify during the storage and/or transmission of the ASR audio information including the pair of time frequency points and by determining the relative phase difference between the identified pair of time frequency points.
 
11. A computer program product residing on a non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations comprising:
processing automated speech recognition (ASR) audio information for storage and/or transmission;
determining a pair of time frequency points within the ASR audio information with time frequency points from a same channel; and
encoding a watermark in relative phase information of the ASR audio information between the pair of time frequency points during storage and/or transmission of the audio information including:
generating a watermark database with a plurality of keys mapping a plurality of watermarks to particular pairs of time frequency points for determining or identifying the particular pairs of time frequency points; and
recording a key mapping the watermark in the ASR audio information to a relative phase difference between the pair of time frequency points from the same channel in the watermark database; and
decoding the watermark using the key from the watermark database to identify the pair of time frequency points and by determining the relative phase difference between the identified pair of time frequency points.