US 11,887,647 B2
Deep learning accelerator and random access memory with separate memory access connections
Poorna Kale, Folsom, CA (US); and Jaime Cummins, Bainbridge Island, WA (US)
Assigned to Micron Technology, Inc., Boise, ID (US)
Filed by Micron Technology, Inc., Boise, ID (US)
Filed on Apr. 9, 2020, as Appl. No. 16/844,993.
Prior Publication US 2021/0319822 A1, Oct. 14, 2021
Int. Cl. G06F 9/38 (2018.01); G11C 11/34 (2006.01); G06F 9/30 (2018.01); G06N 3/063 (2023.01); G06F 9/50 (2006.01); G06N 3/10 (2006.01); G06F 17/16 (2006.01)
CPC G11C 11/34 (2013.01) [G06F 9/30007 (2013.01); G06F 9/3877 (2013.01); G06F 9/3893 (2013.01); G06F 9/5027 (2013.01); G06F 17/16 (2013.01); G06N 3/063 (2013.01); G06N 3/10 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A device, comprising:
an integrated circuit package enclosing components of the device, the components enclosed within the integrated circuit package including:
an accelerator for deep learning, the accelerator having:
at least one processing unit configured to execute instructions, each of the instructions having one or more matrix operands and configured to instruct the at least one processing unit to perform an operation on the one or more matrix operands;
a control unit;
local memory; and
a memory interface;
random access memory configured to have:
a first region configured to store the instructions and store matrices of an artificial neural network, the instructions executable by the at least one processing unit of the accelerator;
a second region configured to store inputs to the artificial neural network; and
a third region configured to store outputs generated by the accelerator autonomously executing the instructions to process, using the matrices stored in the first region, the inputs in the second region, wherein the control unit is configured to load, in response to the inputs being written into the second region, the instructions from the first region of the random access memory for execution by the at least one processing unit; and
at least two interfaces configured to access, via a connection between the memory interface of the accelerator and the random access memory, the random access memory concurrently by at least two devices that are external to the device, wherein the at least two interfaces include:
a first interface coupled to the third region and configured to provide a central processing unit configured outside of the integrated circuit package with access to obtain the outputs from the third region; and
a second interface coupled to the second region and configured to provide a direct memory access controller configured outside of the integrated circuit package with access to write the inputs into the second region.