US 12,148,421 B2
	Using large language model(s) in generating automated assistant response(s
Martin Baeuml, Zurich (CH); Thushan Amarasiriwardena, Alameda, CA (US); Roberto Pieraccini, Zurich (CN); Vikram Sridar, Zurich (CH); Daniel De Freitas Adiwardana, San Francisco, CA (US); Noam M. Shazeer, Palo Alto, CA (US); and Quoc Le, Sunnyvale, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Nov. 22, 2021, as Appl. No. 17/532,794.
Claims priority of provisional application 63/241,232, filed on Sep. 7, 2021.
Prior Publication US 2023/0074406 A1, Mar. 9, 2023
Int. Cl. G10L 15/22 (2006.01); G06F 16/9032 (2019.01); G10L 15/183 (2013.01)

CPC G10L 15/183 (2013.01) [G06F 16/90332 (2019.01); G10L 15/22 (2013.01)]

17 Claims

1. A method implemented by one or more processors, the method comprising:

as part of a dialog session between a user of a client device and an automated assistant implemented by the client device:

receiving a stream of audio data that captures a spoken utterance of the user, the stream of audio data being generated by one or more microphones of the client device, and the spoken utterance including an assistant query;

determining, based on processing the stream of audio data, a set of assistant outputs, each assistant output in the set of assistant outputs being responsive to the assistant query included in the spoken utterance, wherein determining the set of assistant outputs comprises:

processing, using an automatic speech recognition model, the stream of audio data to generate ASR output;

processing, using a natural language understanding (NLU) model, the ASR output to generate NLU output; and

causing the set of assistant outputs to be determined based on the NLU output;

processing the set of assistant outputs and context of the dialog session to:

generate a set of modified assistant outputs using one or more large language model (LLM) outputs generated using an LLM, each of the one or more LLM outputs being determined by processing, using the LLM, at least part of the context of the dialog session and one or more of the assistant outputs included in the set of assistant outputs, and

generate an additional assistant query that is related to the spoken utterance based on at least part of the context of the dialog session and at least part of the assistant query, wherein generating the additional assistant query that is related to the spoken utterance comprises:

determining, based on the NLU output, an intent associated with the assistant query that is included in the spoken utterance;

identifying, based on the intent associated with the assistant query at least one related intent that is related to the intent associated with the assistant query that is included in the spoken utterance; and

generating the additional assistant query that is related to the spoken utterance based on the at least one related intent;

determining, based on the additional assistant query, additional assistant output that is responsive to the additional assistant query;

processing, based on the additional assistant output that is responsive to the additional assistant query, the set of modified assistant outputs to generate a set of additional modified assistant outputs; and

causing a given additional modified assistant output, from the set of additional modified assistant outputs, to be provided for presentation to the user.