US 12,230,268 B2
	Contextual voice user interface
Michael James Moniz, Seattle, WA (US); Abishek Ravi, Seattle, WA (US); Ryan Scott Aldrich, Seattle, WA (US); and Michael Bennett Adams, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jan. 30, 2023, as Appl. No. 18/161,561.
Application 18/161,561 is a continuation of application No. 16/599,368, filed on Oct. 11, 2019, granted, now 11,594,215.
Application 16/599,368 is a continuation of application No. 15/634,780, filed on Jun. 27, 2017, granted, now 10,446,147, issued on Oct. 15, 2019.
Prior Publication US 2023/0317074 A1, Oct. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/22 (2006.01); G06F 40/30 (2020.01); G10L 15/06 (2013.01); G10L 15/26 (2006.01)

CPC G10L 15/22 (2013.01) [G06F 40/30 (2020.01); G10L 15/063 (2013.01); G10L 15/26 (2013.01); G10L 2015/223 (2013.01)]

10 Claims

1. A computer-implemented method, comprising:

receiving first input data corresponding to a first utterance detected by at least one microphone of a speech-detection device;

determining, using a trained machine learning (ML) model, that the first input data represents a request to receive an explanation of processing related to second input data received prior to the first input data, the second input data corresponding to a previously detected utterance and the processing corresponding to a previous audio output responsive to the second input data;

determining natural language understanding (NLU) data representing an NLU hypothesis determined using the trained ML model and the second input data;

based on the request represented in the first input data and the NLU data, generating output data including a first portion representing the NLU hypothesis and a second portion representing an explanation that the first portion corresponds to an intent that was determined for the previously detected utterance; and

causing the speech-detection device to provide a first audio output corresponding to the output data, the first audio output including a first audio portion representing the NLU hypothesis and a second audio portion representing the explanation.