US 11,704,535 B1
	Hardware architecture for a neural network accelerator
Kumar S. S. Vemuri, Hyderabad (IN); Mahesh S. Mahadurkar, Wani (IN); Pavan K. Nadimpalli, Gudivada (IN); and Venkat Praveen K. Kancharlapalli, Vijayawada (IN)
Assigned to XILINX, INC., San Jose, CA (US)
Filed by Xilinx, Inc., San Jose, CA (US)
Filed on May 17, 2019, as Appl. No. 16/415,907.
Int. Cl. G06N 3/04 (2023.01); G06F 3/06 (2006.01); G06N 3/063 (2023.01); G06F 5/16 (2006.01); G06N 3/02 (2006.01); G06N 3/06 (2006.01); G06N 20/00 (2019.01)

CPC G06N 3/04 (2013.01) [G06F 3/0655 (2013.01); G06F 3/0656 (2013.01); G06N 3/063 (2013.01); G06F 3/061 (2013.01); G06F 3/0613 (2013.01); G06F 3/0635 (2013.01); G06F 3/0679 (2013.01); G06F 3/0688 (2013.01); G06F 5/16 (2013.01); G06N 3/02 (2013.01); G06N 3/06 (2013.01); G06N 20/00 (2019.01)]

21 Claims

1. An integrated circuit (IC), comprising:

a digital processing engine (DPE) array having a plurality of DPEs configured to execute one or more layers of a neural network;

reconfigurable integrated circuitry configured to include:

an input/output (IO) controller configured to receive input data to be processed by the DPE array based on the one or more layers of the neural network;

a feeding controller configured to feed the input data from the IO controller to the DPE array executing the one or more layers of the neural network;

a weight controller configured to provide weight parameters used for processing the input data through the one or more layers of the neural network to the DPE array;

an output controller configured to receive processed data from the DPE array based on the one or more layers of the neural network; and

configurable buffers configured to communicate with the IO controller, the feeding controller, the weight controller, and the output controller to facilitate data processing between the IO controller, the feeding controller and the weight controller by alternating between storing and processing data in the configurable buffers;

wherein a first one of the DPEs comprises a neural network unit (NNU) configured to process the input data; and

wherein the NNU comprises digital signal processors (DSPs) configured to process the input data at a frequency that is at least double a frequency at which the programmable logic is configured to operate.