US 12,106,758 B2
	Voice commands for an automated assistant utilized in smart dictation
Victor Carbune, Zurich (CH); Alvin Abdagic, Zurich (CH); Behshad Behzadi, Freienbach (CH); Jacopo Sannazzaro Natta, Berkeley, CA (US); Julia Proskurnia, Zurich (CH); Krzysztof Andrzej Goj, Zurich (CH); Srikanth Pandiri, Zurich (CH); Viesturs Zarins, Zurich (CH); Nicolo D'Ercole, Oberrieden (CH); Zaheed Sabur, Baar (CH); and Luv Kothari, Sunnyvale, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on May 17, 2021, as Appl. No. 17/322,765.
Prior Publication US 2022/0366910 A1, Nov. 17, 2022
Int. Cl. G06F 3/01 (2006.01); G06F 3/0488 (2022.01); G06N 20/00 (2019.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 15/26 (2006.01)

CPC G10L 15/26 (2013.01) [G06F 3/0488 (2013.01); G06N 20/00 (2019.01); G10L 15/18 (2013.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A method implemented by one or more processors, the method comprising:

receiving audio data that captures a spoken utterance of a user of a client device, the audio data being generated by one or more microphones of the client device;

determining whether touch input of the user is being simultaneously directed to a transcription, that is displayed at the client device via a software application accessible at the client device, at the same time the audio data that captures the spoken utterance is received;

in response to determining that no touch input of the user is being simultaneously directed to the transcription at the same time the audio data that captures the spoken utterance is received:

determining to incorporate recognized text, that corresponds to the spoken utterance, into the transcription;

in response to determining that touch input of the user is being simultaneously directed to the transcription at the same time the audio data that captures the spoken utterance is received:

determining, based on one or more terms of the spoken utterance, whether to:

incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or

perform an assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance;

in response to determining to incorporate the recognized text that corresponds to the spoken utterance into the transcription:

automatically incorporating the recognized text that corresponds to the spoken utterance into the transcription; and

in response to determining to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance:

causing an automated assistant to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance.