US 11,669,715 B2
Hardware architecture for accelerating artificial intelligent processor
Pingping Shao, San Jose, CA (US)
Assigned to Shanghai Iluvatar CoreX Semiconductor Co., Ltd., Shanghai (CN)
Filed by Nanjing Iluvatar CoreX Technology Co., Ltd., Nanjing (CN)
Filed on Apr. 30, 2019, as Appl. No. 16/237,610.
Claims priority of application No. 201810862182.1 (CN), filed on Aug. 1, 2018.
Prior Publication US 2020/0042867 A1, Feb. 6, 2020
Int. Cl. G06N 3/04 (2023.01); G06F 9/50 (2006.01); G06F 13/00 (2006.01); G06N 3/063 (2023.01); G06T 1/20 (2006.01); G06F 13/12 (2006.01)
CPC G06N 3/04 (2013.01) [G06F 9/5027 (2013.01); G06F 13/00 (2013.01); G06N 3/063 (2013.01); G06T 1/20 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A hardware architecture for accelerating an artificial intelligence processor, comprising:
artificial intelligence task is regarded as a 5D tensor in an application-specific integrated circuit (ASIC), and in each dimension, the task is divided into groups, and each group being further divided into a plurality of waves;
a host,
a frontal engine,
a parietal engine,
a renderer engine,
an occipital engine,
a temporal engine, and
a memory, wherein the memory comprises random access memory (RAM);
wherein the frontal engine obtains a 5D tensor from the host and divides the 5D tensor into a plurality of tensors, wherein the parietal engine sends the plurality of tensors to the renderer engine, wherein the renderer engine is configured to execute an input feature renderer and a partial tensor output to the occipital engine, wherein the occipital engine accumulates a partial tensor and executes an output feature renderer to obtain a final tensor sent to the temporal engine, wherein the temporal engine performs data compression and writes the final tensor into the memory.