US 12,254,878 B1
	Natural language processing and classification
Kay Rottmann, Stuttgart (DE)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 1, 2021, as Appl. No. 17/539,716.
Int. Cl. G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01); G10L 15/16 (2006.01); G10L 25/30 (2013.01)

CPC G10L 15/22 (2013.01) [G10L 15/063 (2013.01); G10L 15/18 (2013.01); G10L 15/30 (2013.01); G10L 15/16 (2013.01); G10L 25/30 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving first audio data representing a first spoken natural language input responsive to a prior system output;

determining, using the first audio data and automatic speech recognition (ASR) processing, first ASR output data;

selecting, based on the first spoken natural language input being responsive to the prior system output, a first generator model from among a plurality of generator models, wherein the first generator model is configured to convert a first natural language input corresponding to a first sentiment to a second natural language input corresponding to a second sentiment, wherein the first generator model changes a first portion of the first natural language input to generate the second natural language input;

processing the first ASR output data, corresponding to the first sentiment, using the first generator model to determine first text data representing a first machine-generated input corresponding to the second sentiment, wherein the first generator model receives the first ASR output data as the first natural language input and produces the first text data as the second natural language input;

determining, using the first ASR output data and the first text data, a second portion of the first ASR output data different than a third portion of the first text data, wherein the second portion causes the first spoken natural language input to be classified to the first sentiment;

based on determining the second portion, determining a first natural language output requesting feedback with respect to the prior system output;

determining, using the first natural language output and text-to-speech (TTS) processing, output audio data; and

causing a device to present the output audio data.