| CPC G10L 15/22 (2013.01) [G06F 40/30 (2020.01); G10L 15/063 (2013.01); G10L 15/26 (2013.01); G10L 2015/223 (2013.01)] | 10 Claims | 

| 
               1. A computer-implemented method, comprising: 
            receiving first input data corresponding to a first utterance detected by at least one microphone of a speech-detection device; 
                determining, using a trained machine learning (ML) model, that the first input data represents a request to receive an explanation of processing related to second input data received prior to the first input data, the second input data corresponding to a previously detected utterance and the processing corresponding to a previous audio output responsive to the second input data; 
                determining natural language understanding (NLU) data representing an NLU hypothesis determined using the trained ML model and the second input data; 
                based on the request represented in the first input data and the NLU data, generating output data including a first portion representing the NLU hypothesis and a second portion representing an explanation that the first portion corresponds to an intent that was determined for the previously detected utterance; and 
                causing the speech-detection device to provide a first audio output corresponding to the output data, the first audio output including a first audio portion representing the NLU hypothesis and a second audio portion representing the explanation. 
               |