US 12,087,305 B2
Speech processing
Beiye Liu, Bedford Hills, NY (US); Wael Hamza, Yorktown Heights, NY (US); Liwei Cai, Cambridge, MA (US); Konstantine Arkoudas, New York, NY (US); Chengwei Su, Belmont, MA (US); and Subendhu Rongali, Amherst, MA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on May 26, 2023, as Appl. No. 18/324,440.
Application 18/324,440 is a continuation of application No. 17/106,600, filed on Nov. 30, 2020, granted, now 11,682,400.
Prior Publication US 2023/0368796 A1, Nov. 16, 2023
Int. Cl. G10L 21/00 (2013.01); G10L 15/18 (2013.01); G10L 15/26 (2006.01)
CPC G10L 15/26 (2013.01) [G10L 15/1822 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving first input data representing speech;
processing the first input data using a spoken language understanding (SLU) component, the SLU component configured:
to perform an audio-to-text processing task, and
to perform an audio-to-semantic meaning data task,
wherein the SLU component is trained using a first training dataset including masked automatic speech recognition (ASR) data comprising first masked ASR data corresponding to a first spoken input, wherein at least one word of the first spoken input is masked in the first ASR data;
determining, based on processing the first input data using the SLU component, first data representing a semantic meaning corresponding to the first input data; and
determining, using the first data, first output data responsive to the first input data.