US 11,922,945 B2
	Voice to text conversion based on third-party agent content
Barnaby James, Los Gatos, CA (US); Bo Wang, San Jose, CA (US); Sunil Vemuri, Pleasanton, CA (US); David Schairer, San Jose, CA (US); Ulas Kirazci, Mountain View, CA (US); Ertan Dogrultan, Belmont, CA (US); and Petar Aleksic, Jersey City, NJ (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Mar. 23, 2023, as Appl. No. 18/125,606.
Application 18/125,606 is a continuation of application No. 17/582,926, filed on Jan. 24, 2022, granted, now 11,626,115.
Application 17/582,926 is a continuation of application No. 16/791,334, filed on Feb. 14, 2020, granted, now 11,232,797, issued on Jan. 25, 2022.
Application 16/791,334 is a continuation of application No. 15/372,188, filed on Dec. 7, 2016, granted, now 10,600,418, issued on Mar. 24, 2020.
Prior Publication US 2023/0260517 A1, Aug. 17, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/26 (2006.01); G06F 40/205 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G10L 15/18 (2013.01); G10L 15/183 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)

CPC G10L 15/26 (2013.01) [G06F 40/205 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G10L 15/1815 (2013.01); G10L 15/183 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01); G10L 2015/228 (2013.01)]

17 Claims

1. A method implemented by one or more processors, the method comprising:

receiving, by a third-party agent, and from a local agent that is local to a voice-enabled electronic device of a user, an invocation request for content;

in response to receiving the invocation request for the content, generating, by the third-party agent, responsive content that is responsive to voice input, provided by the user of the voice-enabled electronic device, that is directed to the local agent, and that is to be provided by the local agent in response to the voice input via the voice-enabled electronic device;

identifying, by the third-party agent, one or more contextual parameters that are in addition to the responsive content and that are indicative of further voice input, anticipated to be provided by the user of the voice-enabled electronic device, in response to the responsive content being output for presentation to the user via the voice-enabled electronic device; and

transmitting, by the third-party agent, and to the local agent, the responsive content and the one or more contextual parameters, wherein transmitting the responsive content and the one or more contextual parameters to the local agent cause to the local agent to:

output, for presentation to the user via the voice-enabled electronic device, the responsive content in response to the voice input;

receive, via the voice-enabled electronic device, an additional voice input provided by the user and in response to the output being provided for presentation to the user;

determine, via the voice-enabled electronic device, text to transmit to the third-party agent, wherein text is based on a voice to text conversion of the additional voice input and is based on one or more of the contextual parameters; and

transmit the text to the third-party agent.