| CPC G10L 15/197 (2013.01) [G10L 15/1815 (2013.01); G10L 15/22 (2013.01); G10L 15/285 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving first input data representing a first user input;
generating a first prompt including at least the first input data, the first prompt being a first input for a large language model (LLM) to determine a response to the first user input;
determining, using the LLM, first encoded representations corresponding to the first prompt;
storing, using a cache associated with the LLM, the first encoded representations;
performing, using the LLM and the first encoded representations, a first iteration of processing to determine a response to the first user input, the first iteration of processing resulting in generation of first processing data;
determining second encoded representations corresponding to the first processing data;
storing, using the cache, the second encoded representations;
performing, using the LLM, the first encoded representations and the second encoded representations, a second iteration of processing to determine a first response corresponding to the first user input;
causing presentation of the first response;
based on the LLM determining the first response, deleting, from the cache, the second encoded representations;
receiving second input data representing a second user input;
generating a second prompt including at least the first input data and the second input data, the second prompt being a second input for the LLM to determine a response to the second user input;
determining, from the cache, the first encoded representations corresponding to a first portion of the second prompt, wherein the first portion of the second prompt includes the first input data;
determining, using the LLM, third encoded representations corresponding to a second portion of the second prompt, wherein the second portion of the second prompt includes the second input data;
determining, using the LLM, the first encoded representations and the third encoded representations, a second response to the second user input; and
causing presentation of the second response.
|