US 11,714,992 B1
	Neural network processing based on subgraph recognition
Richard John Heaton, San Jose, CA (US); Randy Renfu Huang, Morgan Hill, CA (US); and Ron Diamant, Albany, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 13, 2018, as Appl. No. 16/219,760.
Int. Cl. G06F 16/00 (2019.01); G06N 3/04 (2023.01); G06F 9/30 (2018.01); G06F 16/901 (2019.01); G06F 9/48 (2006.01)

CPC G06N 3/04 (2013.01) [G06F 9/4881 (2013.01); G06F 9/30003 (2013.01); G06F 16/9024 (2019.01)]

18 Claims

1. A method comprising:

receiving a computational graph, the computation graph including a sequence of computation operations to be performed for a neural network model;

traversing the computational graph to extract a first computational subgraph;

computing a first identifier of the first computational subgraph;

obtaining, based on the first identifier, pre-compiled first instructions associated with the first identifier from a database, the first instructions representing scheduling of resources at a neural network processor to perform first computation operations included in the first computation subgraph;

traversing the computational graph to extract a second computation subgraph;

computing a second identifier of the second computational subgraph;

determining that pre-compiled instructions associated with the second identifier are not stored in the database;

generating, using an on-the-fly complier, second instructions associated with the second computation subgraph, the second instructions representing scheduling of resources at the neural network processor to perform second computation operations included in the second computation subgraph;

generating an instruction file including the first instructions and the second instructions; and

executing the instruction file at the neural network processor to perform the first computation operations and the second computation operations;

wherein the first computation subgraph and the second computation subgraph are extracted based on:

identifying types of computation operations included in the first and second computation operations,

identifying a sequence of the types of computation operations included in the first and second computation operations, and

identifying a number of input data elements and a number of output data elements of each of the types of computation operations.