US 11,869,495 B2
Voice to voice natural language understanding processing
Saiprasad Satya Kapila, Redmond, WA (US); Antonio Melis, Seattle, WA (US); Manikya Pavan Kiran Pothukuchi, Sammamish, WA (US); Steven Rabuchin, Kirkland, WA (US); Robert Pulciani, Sammamish, WA (US); and Robert William Serr, Kirkland, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jul. 9, 2020, as Appl. No. 16/925,045.
Application 16/925,045 is a continuation of application No. 16/007,691, filed on Jun. 13, 2018, granted, now 10,720,157.
Prior Publication US 2020/0395016 A1, Dec. 17, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G10L 15/18 (2013.01); G06Q 30/0601 (2023.01); G10L 15/30 (2013.01); G10L 13/00 (2006.01)
CPC G10L 15/22 (2013.01) [G06Q 30/0621 (2013.01); G06Q 30/0635 (2013.01); G10L 13/00 (2013.01); G10L 15/1815 (2013.01); G10L 15/30 (2013.01); G10L 15/1807 (2013.01); G10L 2015/223 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, from a first device, first input audio data corresponding to a first utterance;
performing speech processing using the first input audio data to determine first intent data;
determining first data is needed to execute a first action corresponding to the first intent data;
determining second data corresponding to a request for the first data;
sending the second data to the first device;
performing processing with regard to the first intent data to determine first output data;
storing the first output data;
after storing the first output data, receiving, from a second device, second input audio data corresponding to a second utterance;
performing speech processing using the second input audio data and the first output data to determine second intent data;
performing processing with regard to the second intent data to determine second output data;
performing speech synthesis using the second output data to determine output audio data responsive to the second utterance; and
sending the output audio data to the second device for output.