US 12,229,192 B2
	Speculative decoding in autoregressive generative artificial intelligence models
Christopher Lott, San Diego, CA (US); Mingu Lee, San Diego, CA (US); Wonseok Jeon, San Diego, CA (US); and Roland Memisevic, Toronto (CA)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Dec. 13, 2023, as Appl. No. 18/538,965.
Claims priority of provisional application 63/460,850, filed on Apr. 20, 2023.
Prior Publication US 2024/0354346 A1, Oct. 24, 2024
Int. Cl. G06F 16/901 (2019.01); G06F 40/284 (2020.01)

CPC G06F 16/9027 (2019.01) [G06F 40/284 (2020.01)]

37 Claims

1. A processing system, comprising:

at least one memory having executable instructions stored thereon; and

one or more processors configured to execute the executable instructions in order to cause the processing system to:

generate, based on an input prompt and a generative artificial intelligence model, a first plurality of sets of tokens, each set of tokens in the first plurality of sets of tokens corresponding to a first portion of a candidate response to the input prompt;

speculatively generate, using the generative artificial intelligence model, a second plurality of sets of tokens, each set of tokens in the second plurality of sets of tokens corresponding to a second portion of the candidate response to the input prompt based on the first plurality of sets of tokens;

while speculatively generating the second plurality of sets of tokens, select a set of tokens from the first plurality of sets of tokens; and

output the selected set of tokens from the first plurality of tokens and an associated set of tokens in the second plurality of tokens as a response to the input prompt.