US 12,437,204 B1
	Scheduling neural inference tasks
Stanislaw Ignacy Pasko, Zawonia (PL); and Alexander Ivchenko, Gdansk (PL)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 29, 2021, as Appl. No. 17/362,361.
Int. Cl. G10L 15/16 (2006.01); G06F 9/48 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/088 (2023.01); G10L 15/22 (2006.01)

CPC G06N 3/088 (2013.01) [G06F 9/4818 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G10L 15/16 (2013.01); G06F 2209/484 (2013.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving a first portion of audio data corresponding to first audio detected by a device;

identifying a first neural network configured to determine a user profile from a sound of speech represented in the audio data and a second neural network configured to perform automatic speech recognition (ASR) processing of the speech, wherein the second neural network corresponds to a larger memory footprint than the first neural network;

in response to the second neural network corresponding to a larger memory footprint than the first neural network, processing the first portion using first model data corresponding to the first neural network;

after processing the first portion using the first model data, processing the first portion using second model data corresponding to the second neural network to generate a first output;

determining that the second neural network is associated with a higher priority than the first neural network;

receiving a first request for a second portion of the audio data for processing using the first neural network;

after receiving the first request, receiving a second request for the second portion for processing using the second neural network;

receiving the second portion of the audio data; and

in response to determining that the second neural network is associated with a higher priority than the first neural network, processing the second portion using the second neural network to generate a second output before processing the second portion using the first neural network.