US 12,488,784 B2
System and method for adapting natural language understanding (NLU) engines optimized on text to audio input
Jean-Francois Lavallee, Quebec (CA)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Aug. 29, 2022, as Appl. No. 17/897,308.
Prior Publication US 2024/0071368 A1, Feb. 29, 2024
Int. Cl. G10L 15/187 (2013.01); G06F 40/242 (2020.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01)
CPC G10L 15/063 (2013.01) [G06F 40/242 (2020.01); G10L 15/02 (2013.01); G10L 15/187 (2013.01); G10L 2015/025 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A computer-implemented method, executed on a computing device, comprising:
generating a plurality of potential vocalizations of a plurality of text samples, wherein generating the plurality of potential vocalizations includes processing the plurality of text samples using a text-to-speech tokenizer that converts each text sample into a token representing the speech content of the respective text sample;
identifying a plurality of phonemes associated with the plurality of potential vocalizations, wherein identifying the plurality of phonemes associated with the plurality of potential vocalizations includes matching at least a portion of the plurality of potential vocalizations to the plurality of phonemes from a phonetic dictionary:
generating a plurality of phonetically-related text portions based upon, at least in part, the plurality of phonemes, wherein generating the plurality of phonetically-related text portions includes:
identifying a plurality of sequences of text portions with at least a threshold similarity to the plurality of phonemes; and
selecting a sequence of text portions from the plurality of sequences of text portions that covers each of the plurality of text samples and is in a same order as the plurality of text samples, thus defining the plurality of phonetically-related text portions; and
training a natural language understanding (NLU) engine using the plurality of phonetically-related text portions.