| CPC G06N 3/10 (2013.01) [G06F 9/38873 (2023.08); G06N 3/04 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving, at an interface of a multi-tenant provider network, a request to deploy a deep neural network (DNN) type machine learning (ML) model to one or more edge computing devices having a hardware platform from among a plurality of hardware platforms, the request including an identifier of the ML model or a storage location of the ML model within the provider network;
obtaining the ML model from the storage location;
generating an intermediate representation for the ML model, the intermediate representation including one or more nodes corresponding to one or more operators utilized by the ML model;
generating a plurality of scores for a plurality of schedules for at least one node of the intermediate representation using a hardware-specific linear cost model, the hardware-specific linear cost model including terms corresponding to different hardware features of the hardware platform;
identifying, for the at least one node of the intermediate representation, an optimized schedule for at least one operator corresponding to the at least one node based on the plurality of scores generated using the hardware-specific linear cost model;
generating an optimized intermediate representation using the optimized schedule;
generating code corresponding to the ML model based at least in part on the optimized intermediate representation, wherein the code is specific to the hardware platform; and
transmitting the code for deployment to the one or more edge computing devices.
|