US 12,277,400 B1
Multimedia content management for large language model(s) and/or other generative model(s)
Sanil Jain, Sunnyvale, CA (US); Wei Yu, Mountain View, CA (US); Ágoston Weisz, Zurich (CH); Michael Andrew Goodman, Oakland, CA (US); Diana Avram, Zurich (CH); Amin Ghafouri, San Francisco, CA (US); Golnaz Ghiasi, Mountain View, CA (US); Igor Petrovski, Zurich (CH); Khyatti Gupta, Zurich (CH); Oscar Akerlund, Zurich (CH); Evgeny Sluzhaev, Zurich (CH); Rakesh Shivanna, Sunnyvale, CA (US); Thang Luong, Santa Clara, CA (US); Komal Singh, Kitchener (CA); Yifeng Lu, Mountain View, CA (US); and Vikas Peswani, Mountain View, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Feb. 28, 2024, as Appl. No. 18/590,498.
Application 18/590,498 is a continuation of application No. 18/520,218, filed on Nov. 27, 2023, granted, now 11,947,923.
Int. Cl. G06F 40/40 (2020.01); G06V 10/70 (2022.01)
CPC G06F 40/40 (2020.01) [G06V 10/70 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors, the method comprising:
receiving natural language (NL) based input associated with a client device of a user, the NL based input requesting multimedia content;
generating a response that is responsive to the NL based input, wherein generating the response that is responsive to the NL based input comprises:
processing, using a large language model (LLM), LLM input to generate LLM output, the LLM input including at least the NL based input;
determining, based on the LLM output, textual content and multimedia content to be included in the response that is responsive to the NL based input;
initiating obtaining of the multimedia content to be included in the response that is response to the NL based input;
while obtaining the multimedia content to be included in the response that is responsive to the NL based input:
determining, based on one or more signals, whether to continue obtaining the multimedia content to be included in the response that is responsive to the NL based input; and
in response to determining to refrain from continuing to obtain the multimedia content to be included in the response that is responsive to the NL based input:
disengaging obtaining of the multimedia content; and
determining canned textual content or other textual content to be included in the response, and in lieu of the multimedia content, that is responsive to the NL based input; and
causing the response, including the canned textual content or the other textual content, to be rendered at the client device of the user.