CPC G10L 15/22 (2013.01) [G06F 3/167 (2013.01); G10L 15/16 (2013.01); G10L 15/18 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01); G10L 2015/227 (2013.01); G10L 17/00 (2013.01)] | 18 Claims |
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising:
receiving audio data corresponding to a spoken utterance captured by a microphone;
processing, using a neural network-based speech recognizer, the audio data to generate a textual representation of the spoken utterance;
generating an audio representation of the audio data;
processing, using a neural network-based utterance classifier trained on negative training examples from utterances not directed toward automated assistants, a concatenation of the textual representation of the spoken utterance and the audio representation of the audio data to determine that the spoken utterance is not directed toward an automated assistant; and
based on determining that the spoken utterance is not directed toward the automated assistant:
generating instructions for the automated assistant to not generate a response to the spoken utterance; and
deleting the audio data characterizing the spoken utterance from memory hardware in communication with the data processing hardware.
|