US 12,292,915 B1
Security for generative models using attention analysis
Jimit Majmudar, Sunnyvale, CA (US); Rahul Gupta, Waltham, MA (US); Ben Smith, Laguna Beach, CA (US); and Sachin P. Joglekar, Bothell, WA (US)
Assigned to AMAZON TECHNOLOGIES, INC., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 4, 2023, as Appl. No. 18/527,696.
Int. Cl. G06F 16/383 (2019.01); G06F 16/35 (2019.01); G06F 40/40 (2020.01)
CPC G06F 16/383 (2019.01) [G06F 16/35 (2019.01); G06F 40/40 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving a first natural language input;
generating first prompt data comprising the first natural language input, a first span associated with a first application programming interface (API);
generating, by a large language model (LLM) using the first prompt data, first action plan data, wherein the first action plan data comprises a call to the first API;
determining, using a first classifier model, a first trust score for the first span;
determining, using a multi-head attention unit, a first attention score for the first span and the first action plan data;
generating a risk score by multiplying an inverse of the first trust score by the first attention score;
determining that the risk score is above a first threshold value; and
terminating a current dialog session based on the risk score being greater than the first threshold value.