US 12,308,020 B2
	Processing concurrently received utterances from multiple users
Neil Dhillon, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/782,496
Filed by GOOGLE LLC, Mountain View, CA (US)
PCT Filed Dec. 11, 2019, PCT No. PCT/US2019/065619 § 371(c)(1), (2) Date Jun. 3, 2022, PCT Pub. No. WO2021/118549, PCT Pub. Date Jun. 17, 2021.
Prior Publication US 2023/0169959 A1, Jun. 1, 2023
Int. Cl. G10L 15/18 (2013.01); G06F 3/16 (2006.01); G10L 15/22 (2006.01)

CPC G10L 15/18 (2013.01) [G06F 3/167 (2013.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01)]

18 Claims

1. A method implemented by one or more processors, the method comprising:

processing, at a computing device, audio data that captures a first spoken utterance, spoken by a first user, and a second spoken utterance, spoken by a second user,

wherein at least a portion of the first spoken utterance overlaps with at least a portion of the second spoken utterance in the audio data;

determining, based on processing the audio data, that the first spoken utterance provided by the first user embodies a request that is directed to an automated assistant that is accessible via the computing device;

causing, based on determining that the first spoken utterance embodies the request directed to the automated assistant, an interactive element to be rendered at a graphical user interface of the computing device,

wherein the interactive element includes natural language content that characterizes the request embodied by the first spoken utterance;

determining, based on processing the audio data, whether the second spoken utterance embodies an additional request that is directed to the automated assistant; and

when the second spoken utterance is determined to embody the additional request directed to the automated assistant:

causing an additional interactive element to be rendered at the graphical user interface of the computing device,

wherein the additional interactive element includes additional natural language content that characterizes the additional request embodied by the second spoken utterance;

receiving a selection of the interactive element or the additional interactive element; and

responsive to receiving the selection:

when the selection is of the interactive element:

causing the automated assistant to initialize one or more actions in furtherance of fulfilling the request and

determining whether the additional interactive element has been selected at the graphical user interface within a threshold period of time, and

when the additional interactive element has been selected within the threshold period of time:

causing the automated assistant to initialize the one or more other actions in furtherance of fulfilling the additional request embodied in the second spoken utterance, and

when the additional interactive element has not been selected within the threshold period of time:

bypassing causing the automated assistant to initialize the one or more other actions in furtherance of fulfilling the additional request embodied in the second spoken utterance; and

when the selection is of the additional interactive element:

causing the automated assistant to initialize one or more other actions in furtherance of fulfilling the additional request.