US 11,657,260 B2
Neural network hardware accelerator data parallelism
Nikolay Nez, Tokyo (JP); Oleg Khavin, Tokyo (JP); Tanvir Ahmed, Tokyo (JP); Jens Huthmann, Hyogo (JP); and Sakyasingha Dasgupta, Tokyo (JP)
Assigned to EDGECORTIX PTE. LTD., Singapore (SG)
Filed by Edgecortix Pte. Ltd., Singapore (SG)
Filed on Oct. 26, 2021, as Appl. No. 17/510,397.
Prior Publication US 2023/0128600 A1, Apr. 27, 2023
Int. Cl. G06N 3/063 (2023.01)
CPC G06N 3/063 (2013.01) 18 Claims
OG exemplary drawing
 
1. An integrated circuit comprising:
a plurality of memory banks, each memory bank among the plurality of memory banks configured to store values and to transmit stored values;
a plurality of computation units, each computation unit among the plurality of computation units including:
a processor including circuitry configured to perform a mathematical operation on an input data value and a weight value to produce a resultant data value, and
a computation controller configured to:
receive the input data value from any of the plurality of memory banks,
receive the weight value from any of the plurality of memory banks,
cause the processor to perform the mathematical operation, and
transmit the resultant data value to at least two memory banks among the plurality of memory banks, such that a single transmission of the resultant data value is received by the at least two memory banks at negligibly different times, wherein differences between the negligibly different times are attributable to differences in physical distances between the computation unit and respective ones of the at least two memory banks;
a plurality of interconnects connecting each computation unit among the plurality of computation units to each memory bank among the plurality of memory banks;
from the single computation unit to the at least two memory banks;
wherein the computation controller of a first computation unit among the plurality of computation units is further configured to:
synchronize a second computation unit among the plurality of computation units to receive one of the input data value or the weight value, and
read the one of the input data value or the weight value from the memory bank storing the one of the input data value or the weight value among the plurality of memory banks, such that the memory bank storing the one of the input data value or the weight value makes a single transmission of the one of the input data value or the weight value to be read by the computation controller and the second computation unit at two negligibly different times, wherein a difference between the two negligibly different times is attributable to a difference in a physical distance between the computation controller and the memory bank storing the one of the input data value or the weight value, and a physical distance between the second computation unit and the memory bank storing the one of the input data value or the weight value,
wherein the plurality of interconnects route output from the memory bank storing the one of the input data value or the weight value among the plurality of memory banks to the first computation unit and the second computation unit to facilitate the single transmission of the one of the input data value or the weight value from the memory bank storing the one of the input data value or the weight value among the plurality of memory banks to the first computation unit and the second computation unit.