CPC G06F 9/3887 (2013.01) [G06F 9/3001 (2013.01); G06F 9/3836 (2013.01)] | 26 Claims |
1. An AI accelerator apparatus configured with in-memory compute, the apparatus comprising:
one or N chiplets, where N is an integer greater than 1, each of the chiplets comprising a plurality of tiles, and each of the tiles comprising:
a plurality of slices,
a central processing unit (CPU) coupled to the plurality of slices, and
a hardware dispatch device coupled to the CPU;
a first clock configured to output a clock signal ranging from 0.5 GHz to 4 GHz;
a plurality of die-to-die (D2D) interconnects coupled to the each of CPUs in each of the tiles;
a peripheral component interconnect express (PCIe) bus coupled to the CPUs in each of the tiles;
a dynamic random access memory (DRAM) interface coupled to the CPUs in each of the tiles;
a global reduced instruction set computer (RISC) interface coupled to each of the CPUs in each of the tiles;
wherein each of the slices includes a digital in memory compute (DIMC) device configured to allow for a throughput of one or more matrix computations provided in the DIMC device such that the throughput is characterized by 512 multiply accumulates per a clock cycle;
wherein the DIMC device is configured to accelerate the one or more matrix computations for a generative AI application;
wherein the DIMC device is coupled to a second clock configured at an output rate of one half of the rate of the first clock; and
a substrate member configured to provide mechanical support and having a surface region and an interposer, the surface region being coupled to support the one or N chiplets, and the one or N chiplets being coupled to each other using the interposer.
|