US 12,436,884 B2
Memory processing unit core architectures
Jacob Botimer, Ann Arbor, MI (US); Mohammed Zidan, Ann Arbor, MI (US); Timothy Wesley, Ann Arbor, MI (US); Chester Liu, Ann Arbor, MI (US); and Wei Lu, Ann Arbor, MI (US)
Assigned to MemryX Incorporated, Ann Arbor, MI (US)
Filed by MemryX Incorporated, Ann Arbor, MI (US)
Filed on Sep. 12, 2022, as Appl. No. 17/943,116.
Application 17/943,116 is a continuation of application No. PCT/US2021/048466, filed on Aug. 31, 2021.
Claims priority of provisional application 63/072,904, filed on Aug. 31, 2020.
Prior Publication US 2023/0073012 A1, Mar. 9, 2023
Int. Cl. G06F 12/02 (2006.01); G06F 3/06 (2006.01); G06F 9/46 (2006.01); G06F 12/06 (2006.01); G06F 12/12 (2016.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01); G11C 11/54 (2006.01)
CPC G06F 12/0607 (2013.01) [G06F 3/0611 (2013.01); G06F 3/0659 (2013.01); G06F 3/0673 (2013.01); G06F 9/46 (2013.01); G06F 12/0238 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); G11C 11/54 (2013.01)] 6 Claims
OG exemplary drawing
 
1. A memory processing unit (MPU) comprising:
a first memory including a plurality of regions;
a plurality of processing regions interleaved between the plurality of regions of the first memory, wherein the processing regions include a plurality of compute cores configurable in one or more clusters, wherein the plurality of compute cores of respective ones of the plurality of processing regions are coupled between adjacent ones of the plurality of regions of the first memory, and wherein the plurality of compute cores of respective ones of the plurality of processing regions are configurably couplable in series; and
a second memory coupled to the plurality of processing regions, wherein the second memory comprises a plurality of memory macros and wherein organization and storage of a weight array in a given one of the plurality of memory macros comprises:
quantizing the weight array;
unrolling each filter of the quantized weight array and append bias and exponent entries;
reshaping the unrolled and appended filters to fit into corresponding physical channels;
rotating the reshaped unrolled and appended filters; and
loading virtual channels of the rotated reshaped unrolled and appended filters into physical channels of the given one of the memory macros.