US 12,032,653 B2
Method and apparatus for distributed and cooperative computation in artificial neural networks
Frederico C. Pratas, Mirandela (ES); Ayose J. Falcon, Barcelona (ES); Marc Lupon, Barcelona (ES); Fernando Latorre, Cambridge (GB); Pedro Lopez, Barcelona (ES); Enric Herrero Abellanas, Barcelona (ES); and Georgios Tournavitis, Barcelona (ES)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 3, 2021, as Appl. No. 17/306,877.
Application 17/306,877 is a continuation of application No. 15/521,856, granted, now 10,997,273, previously published as PCT/US2015/061453, filed on Nov. 19, 2015.
Claims priority of application No. 14382553 (EP), filed on Dec. 19, 2014.
Prior Publication US 2021/0326405 A1, Oct. 21, 2021
Int. Cl. G06F 17/15 (2006.01); G06F 12/0875 (2016.01); G06N 3/04 (2023.01); G06N 3/063 (2023.01)
CPC G06F 17/153 (2013.01) [G06F 12/0875 (2013.01); G06N 3/04 (2013.01); G06N 3/063 (2013.01); G06F 2212/1024 (2013.01)] 18 Claims
OG exemplary drawing
 
1. An apparatus to perform artificial neural network (ANN) processing comprising:
an input/output (I/O) interface;
a plurality of processing circuits communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing circuits to process at least a portion of the data for the input neurons and the synaptic weights to generate partial results, wherein the plurality of processing circuits includes at least four processing circuits;
an interconnect communicatively coupling the plurality of processing circuits, each of the processing circuits to share the partial results with one or more other of the plurality of processing circuits over the interconnect, the one or more other of the processing circuits to use the partial results to generate additional partial results or final results; and
a plurality of latches to store the received data for the input neurons and the synaptic weights associated with each of the input neurons, wherein the received data for a given computation is to be broadcast to all other of the plurality of processing circuits by providing the latched received data to said all other of the plurality of processing circuits over a plurality of processing cycles to avoid reading the received data from the I/O interface multiple times during the given computation.