US 12,437,233 B2
	Autonomous allocation of deep neural network inference requests in a cluster with heterogeneous devices
Suryaprakash Shanmugam, Santa Clara, CA (US); Yamini Nimmagadda, Portland, OR (US); and Akhila Vidiyala, Beaverton, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Sep. 9, 2021, as Appl. No. 17/470,654.
Prior Publication US 2021/0406777 A1, Dec. 30, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 9/48 (2006.01); G06F 9/50 (2006.01)

CPC G06N 20/00 (2019.01) [G06F 9/4881 (2013.01); G06F 9/5044 (2013.01); G06F 2209/485 (2013.01); G06F 2209/501 (2013.01); G06F 2209/503 (2013.01)]

25 Claims

1. A computing system comprising:

a network controller to communicate with edge nodes;

a processor coupled to the network controller; and

a memory coupled to the processor, the memory including a set of executable program instructions, which when executed by the processor, cause the computing system to:

identify compute capacities of the edge nodes and memory capacities of the edge nodes;

identify a first variant of an Artificial Intelligence (AI) model; and

assign the first variant to a first edge node of the edge nodes based on a compute capacity requirement associated with execution of the first variant, a memory resource requirement associated with the execution of the first variant, the compute capacities of the edge nodes and the memory capacities of the edge nodes.