CPC G06F 8/47 (2013.01) [G06F 16/9024 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A system, comprising:
a processor; and
a compiler executed by the processor, wherein the compiler is configured to
receive a machine learning (ML) model;
generate a graph associated with the ML model, wherein the graph is an internal representation of the ML model;
partition the graph into a first subgraph and a second subgraph, wherein the first subgraph is associated with an ML hardware, and wherein the second subgraph is associated with a processor different from the ML hardware, wherein the partition is based on at least one of:
a) whether an operation within a node of the graph is supported by the ML hardware, or
b) latency associated with a node within the first subgraph as opposed to the second subgraph, or
c) an amount of data movement if a node is included within the first subgraph as opposed to the second subgraph;
generate a set of low-level instructions associated with the first subgraph; and
identify one or more resources in the ML hardware to execute the set of low-level instructions associated with the first subgraph.
|