US 12,266,359 B2
	System(s) and method(s) to enable modification of an automatically arranged transcription in smart dictation
Nicolo D'Ercole, Oberrieden (CH); Shumin Zhai, Zurich (CH); Swante Scholz, Zurich (CH); Mehek Sharma, Thalwil (CH); Adrien Olczak, Irvine, CA (US); Akshay Kannan, Fremont, CA (US); Alvin Abdagic, Zurich (CH); Julia Proskurnia, Zurich (CH); and Viesturs Zarins, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Sep. 2, 2022, as Appl. No. 17/902,560.
Claims priority of provisional application 63/390,830, filed on Jul. 20, 2022.
Prior Publication US 2024/0029728 A1, Jan. 25, 2024
Int. Cl. G10L 15/22 (2006.01); G06F 16/683 (2019.01); G10L 15/08 (2006.01)

CPC G10L 15/22 (2013.01) [G06F 16/685 (2019.01); G10L 15/08 (2013.01)]

18 Claims

1. A method implemented by one or more processors, the method comprising:

receiving audio data that captures a spoken utterance of a user of a client device, the audio data being generated by one or more microphones of the client device;

processing, using an automatic speech recognition (ASR) model, the audio data that captures the spoken utterance of the user to generate textual data corresponding to the spoken utterance;

determining, based on the audio data that captures the spoken utterance and/or based on the textual data corresponding to the spoken utterance, whether the user has specified an arrangement of the textual data for a transcription of the spoken utterance;

in response to determining that the user has not specified the arrangement of the textual data for the transcription:

generating, based on the textual data corresponding to the spoken utterance, a transcription of the spoken utterance that is automatically arranged, the transcription that is automatically arranged including at least an automatic punctuation mark following a given term that is included in the textual data and an automatic capitalization of a subsequent term that is included in the textual data and that is subsequent to the given term;

causing the transcription to be provided for presentation to the user via a display of the client device;

generating a modification selectable element that, when selected by the user, causes the transcription that is automatically arranged to be modified to remove the automatic punctuation mark and/or the automatic capitalization;

receiving touch input from the user via the display of the client device, the touch input being directed to the transcription of the spoken utterance; and

in response to receiving the touch input from the user that is directed to the transcription of the spoken utterance and based on the transcription of the spoken utterance being generated based on the transcription being automatically arranged:

causing the modification selectable element to be provided for presentation to the user via the display of the client device; and

in response to determining that the user has specified the arrangement of the textual data for the transcription:

generating, based on the textual data corresponding to the spoken utterance and based on the arrangement specified by the user, the transcription of the spoken utterance;

causing the transcription to be provided for presentation to the user via the display of the client device; and

refraining from generating the modification selectable element.