US 12,230,271 B2
	Utterance classifier
Nathan David Howard, Mountain View, CA (US); Gabor Simko, Santa Clara, CA (US); Maria Carolina Parada San Martin, Boulder, CO (US); Ramkarthik Kalyanasundaram, Cupertino, CA (US); Guru Prakash Arumugam, Sunnyvale, CA (US); and Srinivas Vasudevan, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 1, 2023, as Appl. No. 18/526,991.
Application 18/526,991 is a continuation of application No. 17/804,657, filed on May 31, 2022, granted, now 11,848,018.
Application 17/804,657 is a continuation of application No. 16/935,112, filed on Jul. 21, 2020, granted, now 11,361,768, issued on Jun. 14, 2022.
Application 16/935,112 is a continuation of application No. 16/401,349, filed on May 2, 2019, granted, now 11,545,147, issued on Jan. 3, 2023.
Application 16/401,349 is a continuation of application No. 15/659,016, filed on Jul. 25, 2017, granted, now 10,311,872, issued on Jun. 4, 2019.
Prior Publication US 2024/0096326 A1, Mar. 21, 2024
Int. Cl. G10L 15/08 (2006.01); G06F 3/16 (2006.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01); G10L 17/00 (2013.01)

CPC G10L 15/22 (2013.01) [G06F 3/167 (2013.01); G10L 15/16 (2013.01); G10L 15/18 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01); G10L 2015/227 (2013.01); G10L 17/00 (2013.01)]

18 Claims

1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising:

receiving audio data corresponding to a spoken utterance captured by a microphone;

processing, using a neural network-based speech recognizer, the audio data to generate a textual representation of the spoken utterance;

generating an audio representation of the audio data;

processing, using a neural network-based utterance classifier trained on negative training examples from utterances not directed toward automated assistants, a concatenation of the textual representation of the spoken utterance and the audio representation of the audio data to determine that the spoken utterance is not directed toward an automated assistant; and

based on determining that the spoken utterance is not directed toward the automated assistant:

generating instructions for the automated assistant to not generate a response to the spoken utterance; and

deleting the audio data characterizing the spoken utterance from memory hardware in communication with the data processing hardware.