US 12,423,074 B1
Neural network layer fusion
Bin Fan, San Jose, CA (US); Evghenii Gaburov, Santa Clara, CA (US); Yuan Lin, Cupertino, CA (US); and Vinod Grover, Mercer Island, WA (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by NVIDIA Corporation, Santa Clara, CA (US)
Filed on Jan. 24, 2020, as Appl. No. 16/752,552.
Int. Cl. G06F 8/41 (2018.01); G06F 9/54 (2006.01); G06N 3/04 (2023.01)
CPC G06F 8/4441 (2013.01) [G06F 9/54 (2013.01); G06N 3/04 (2013.01)] 32 Claims
OG exemplary drawing
 
1. A non-transitory machine-readable medium having stored thereon instructions, which that if performed at least in part by one or more processors, cause the one or more processors to at least:
implement a compiler that selects, from a graph representing one or more corresponding processor-executable functions, two or more nodes of the graph to be combined based, at least in part, on a comparison of cost values associated with edges of the graph connecting respective pairs of nodes, wherein a cost value of an edge of the graph indicates one or more first computing resources to perform one or more operations corresponding to the selected two or more nodes, wherein the one or more first computing resources comprise computing capacity used to transfer data between operations associated with the selected two or more nodes; and
generate, by the compiler, processor-executable code comprising code to perform a function resulting from a combination of processor-executable functions corresponding to the selected two or more nodes.