| CPC G10L 15/22 (2013.01) [G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 2015/0638 (2013.01); G10L 2015/223 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving, from a device, first input audio data corresponding to a first spoken natural language user input, wherein the first input audio data is associated with user profile data;
using the first input audio data, performing speech processing to determine the first spoken natural language user input is to be responded to using a speech-based conversational assessment component;
generating first output data including a first question related to a speech-based conversational assessment;
sending the first output data to the device for presentation;
after sending the first output data, receiving, from the device, second input audio data corresponding to a second spoken natural language user input responsive to the first question;
using the second input audio data, performing automatic speech recognition (ASR) processing to generate ASR results data corresponding to the second spoken natural language user input;
processing the ASR results data to generate lexical embedding data corresponding to the second spoken natural language user input;
processing the second input audio data to determine tone data representing a tone of the second spoken natural language user input;
processing the ASR results data to determine first topic data representing a first topic of the second spoken natural language user input;
generating state data using the ASR results data, the lexical embedding data, the tone data, and the first topic data;
determining past state data associated with the user profile data, wherein the past state data corresponds to one or more speech-based conversational assessments;
processing the state data and the past state data using a first trained machine learning model to determine a first type of response to the second spoken natural language user input;
processing the state data and the past state data using a second trained machine learning model to determine a second type of response to the second spoken natural language user input;
based on at least one of the first type of response and the second type of response, generating second output data including a second question related to the speech-based conversational assessment; and
sending the second output data to the device for presentation.
|