US 12,093,696 B1
Bus for transporting output values of a neural network layer to cores specified by configuration data
Kenneth Duong, San Jose, CA (US); Jung Ko, San Jose, CA (US); and Steven L. Teig, Menlo Park, CA (US)
Assigned to PERCEIVE CORPORATION, San Jose, CA (US)
Filed by Perceive Corporation, San Jose, CA (US)
Filed on Aug. 9, 2019, as Appl. No. 16/537,481.
Application 16/537,481 is a continuation in part of application No. 16/120,387, filed on Sep. 3, 2018, granted, now 10,740,434.
Claims priority of provisional application 62/873,804, filed on Jul. 12, 2019.
Claims priority of provisional application 62/853,128, filed on May 27, 2019.
Claims priority of provisional application 62/797,910, filed on Jan. 28, 2019.
Claims priority of provisional application 62/792,123, filed on Jan. 14, 2019.
Claims priority of provisional application 62/773,164, filed on Nov. 29, 2018.
Claims priority of provisional application 62/773,162, filed on Nov. 29, 2018.
Claims priority of provisional application 62/753,878, filed on Oct. 31, 2018.
Claims priority of provisional application 62/742,802, filed on Oct. 8, 2018.
Claims priority of provisional application 62/724,589, filed on Aug. 29, 2018.
Claims priority of provisional application 62/660,914, filed on Apr. 20, 2018.
Int. Cl. G06F 9/38 (2018.01); G06F 15/80 (2006.01); G06N 3/063 (2023.01); G06N 5/046 (2023.01)
CPC G06F 9/3877 (2013.01) [G06F 15/80 (2013.01); G06N 3/063 (2013.01); G06N 5/046 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A neural network inference circuit for executing a neural network that comprises a plurality of computation nodes at a plurality of layers, the neural network inference circuit comprising:
a plurality of core circuits comprising memories for storing input values for the computation nodes of the neural network;
a set of post-processing circuits for computing output values of the computation nodes of the neural network, wherein output values of computation nodes of a first layer of the neural network are for storage in the memories of the plurality of core circuits as input values for a second layer of the neural network; and
an output bus, comprising a plurality of lanes, that connects the set of post-processing circuits to the plurality of core circuits, the output bus for (i) receiving a set of output values from the set of post-processing circuits, (ii) transporting the set of output values to the plurality of core circuits based on configuration data specifying a core circuit at which each output value of the set of output values is to be stored, and (iii) aligning the set of output values for storage in the plurality of core circuits,
wherein:
the plurality of lanes of the output bus are ordered and indexed, each lane corresponding to a set of post-processing units;
for a particular clock cycle, each lane receives at most one computed output value from one of the post-processing units of its corresponding set of post-processing units;
a subset of the output values that are transported to a particular core circuit that receives output values in the particular clock cycle are transported on contiguous lanes of the output bus; and
the output bus aligns the subset of output values that are transported to the particular core circuit by shifting the subset of output values by an amount based on a lowest index of the contiguous lanes.