US 12,093,811 B2
Fractal calculating device and method, integrated circuit and board card
Shaoli Liu, Beijing (CN); Guang Jiang, Beijing (CN); Yongwei Zhao, Beijing (CN); and Jun Liang, Beijing (CN)
Assigned to CAMBRICON TECHNOLOGIES CORPORATION LIMITED, Beijing (CN)
Filed by CAMBRICON TECHNOLOGIES CORPORATION LIMITED, Beijing (CN)
Filed on Dec. 23, 2021, as Appl. No. 17/560,411.
Application 17/560,411 is a division of application No. 17/606,838, previously published as PCT/CN2020/087043, filed on Apr. 26, 2020.
Claims priority of application No. 201910347027.0 (CN), filed on Apr. 27, 2019; application No. 201910544723.0 (CN), filed on Jun. 21, 2019; application No. 201910544726.4 (CN), filed on Jun. 21, 2019; application No. 201910545270.3 (CN), filed on Jun. 21, 2019; application No. 201910545271.8 (CN), filed on Jun. 21, 2019; and application No. 201910545272.2 (CN), filed on Jun. 21, 2019.
Prior Publication US 2022/0188614 A1, Jun. 16, 2022
Int. Cl. G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01)
CPC G06N 3/063 (2013.01) [G06N 3/08 (2013.01); G06N 20/00 (2019.01)] 14 Claims
OG exemplary drawing
 
1. A calculation unit for calculating data according to input instructions, wherein the calculation unit is divided into an M-stage pipeline, the calculation unit comprising:
a serial decomposer configured to decompose the input instructions into N serial instructions, and generate intermediate data reflecting the dependency among the N serial instructions;
a local memory including a static memory segment to store the intermediate data, and a loop memory segment to store sub-data required to execute the N serial instructions and data of each stage generated by a calculation of the sub-data in the M-stage pipeline, where the sub-data is a part of the data, wherein the loop memory segment is divided into N sub-memory blocks, each sub-memory block stores specific sub-data required to execute specific serial instructions and the data at each stage generated by the specific sub-data in the M-stage pipeline;
a down-layer decoder configured to assign a local address of the specific sub-data; and
a direct memory access (DMA) configured to sequentially perform data exchange between each sub-memory block and an upper layer of memory according to the local address.