CPC G10L 15/08 (2013.01) [G06N 20/00 (2019.01); G10L 15/22 (2013.01); G10L 2021/02163 (2013.01)] | 19 Claims |
1. A method implemented by one or more processors, the method comprising:
detecting, via one or more microphones of an automated assistant device of a user, a stream of audio data that captures a spoken utterance of the user and that captures ambient noise occurring within a threshold time period of the spoken utterance being spoken by the user;
processing a first portion of the stream of audio data, that captures the ambient noise, to determine one or more classifications for the ambient noise;
processing a second portion of the stream of audio data, that captures the spoken utterance of the user, to generate a transcription of the spoken utterance;
processing, using a machine learning model, both the transcription of the spoken utterance and the one or more classifications of the ambient noise to generate:
a user intent, and
one or more parameters for the user intent;
performing one or more automated assistant actions based on the user intent and using the one or more parameters;
detecting, via one or more of the microphones of the automated assistant device of the user, a subsequent stream of audio data that captures the same spoken utterance of the user and that captures different ambient noise occurring within a threshold time period of the same spoken utterance being spoken by the user;
processing a subsequent stream first portion of the subsequent stream of audio data, that captures the different ambient noise, to determine one or more different classifications for the different ambient noise, the one or more different classifications for the different ambient noise being different than the one or more classifications for the ambient noise;
processing a subsequent stream second portion of the subsequent stream of audio data, that captures the same spoken utterance of the user, to generate a subsequent stream transcription of the same spoken utterance;
processing, using a machine learning model, both the subsequent stream transcription of the same spoken utterance and the one or more different classifications of the different ambient noise to generate:
a subsequent user intent that is different from the user intent, and
one or more additional parameters for the subsequent user intent; and
performing one or more different automated assistant actions based on the subsequent user intent and using the one or more additional parameters, wherein the one or more different automated assistant actions are different from the one or more automated assistant actions.
|