US 12,293,804 B2
Convolution operation accelerator and convolution operation method
Xiangshui Miao, Hubei (CN); Jiawei Fu, Hubei (CN); and Yuhui He, Hubei (CN)
Assigned to HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY, Hubei (CN)
Appl. No. 18/266,610
Filed by HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY, Hubei (CN)
PCT Filed Apr. 20, 2022, PCT No. PCT/CN2022/087794
§ 371(c)(1), (2) Date Jun. 12, 2023,
PCT Pub. No. WO2023/173530, PCT Pub. Date Sep. 21, 2023.
Claims priority of application No. 202210272801.8 (CN), filed on Mar. 18, 2022.
Prior Publication US 2025/0022490 A1, Jan. 16, 2025
Int. Cl. G11C 5/02 (2006.01); G06N 3/0464 (2023.01); G11C 5/06 (2006.01); G06N 3/045 (2023.01); G06N 3/063 (2023.01)
CPC G11C 5/02 (2013.01) [G06N 3/0464 (2023.01); G11C 5/063 (2013.01); G06N 3/045 (2023.01); G06N 3/063 (2013.01); Y02D 10/00 (2018.01)] 10 Claims
OG exemplary drawing
 
1. A convolution operation accelerator, comprising: a three-dimensional non-volatile memory array and a control module, wherein
the three-dimensional non-volatile memory array comprises P word line electrode layers, a bit line electrode layer is placed between any two adjacent word line electrode layers, a non-volatile memory cell array is placed between any adjacent word line electrode layer and bit line electrode layer, and the non-volatile memory cell array is vertically connected to both the word line electrode layer and the bit line electrode layer,
the word line electrode layers comprise a plurality of word line electrodes arranged in parallel, and the word line electrodes in the P word line electrode layers together form a word line electrode array,
each column of non-volatile memory cells in the non-volatile memory cell array is connected onto a same word line in the word line electrode layer connected to the non-volatile memory cell array, the non-volatile memory cells on each oblique line in the non-volatile memory cell array are connected onto a same bit line in a bit line electrode array connected to the non-volatile memory cell array, and the oblique line is an oblique line in the non-volatile memory cell array parallel to a corresponding diagonal line in the non-volatile memory cell array,
a size of two-dimensional input data is denoted as M×N,
when a size of a convolution kernel being subjected to a convolution operation with the two-dimensional input data is 2k×c, the control module is configured to split the convolution kernel into k convolution kernel units with a size of 2×c by row, where k is a positive integer, select and arrange k different sub-array units with a size of (M−2(k−1))×N in the word line electrode array to correspond to the k convolution kernel units with the size of 2×c one to one according to a splitting sequence of the convolution kernel, store each convolution kernel unit and N−c copies thereof in the two layers of the non-volatile memory cell arrays between all adjacent two word line electrode layers in the corresponding sub-array unit, and apply data from a 2i−1th row to a M−2(k−i)th row of the two-dimensional input data to the corresponding word line electrode in an ith sub-array unit in the form of voltage according to corresponding coordinate information, where i=1, 2, L, . . . , k,
when the size of the convolution kernel being subjected to the convolution operation with the two-dimensional input data is (2k+1)×c, the control module is configured to split the convolution kernel into k convolution kernel units with a size of 2×c and one convolution kernel unit with a size of 1×c by row, select k sub-array units with a size of (M−2(k−1)−1)×N and one sub-array unit with a size of (M−2k)×N in the word line electrode array, arrange the k sub-array units with the size of (M−2(k−1)−1)×N to correspond to the k convolution kernel units with the size of 2×c one to one according to the splitting sequence of the convolution kernel, store each convolution kernel unit with the size of 2×c and N−c copies thereof in the two layers of the non-volatile memory cell arrays between all adjacent two word line electrode layers in the corresponding sub-array unit, for each word line electrode layer in the sub-array unit with the size of (M−2k)×N, store the convolution kernel unit with the size of 1×c and the N−c copies of the convolution kernel unit with the size of 1×c in one of the non-volatile memory cell arrays connected to it and set all the non-volatile memory cells in the non-volatile memory cell array not storing the convolution kernel unit to a high-impedance state, apply data from the 2i−1th row to a M−2(k−i)−1th row on the two-dimensional input data to the corresponding word line electrode in the ith sub-array unit with the size of (M−2(k−1)−1)×N in the form of voltage according to the corresponding coordinate information, where i=1, 2, . . . , k, and apply data from a 2k+1th row to a Mth row on the two-dimensional input data to the corresponding word line electrode in the sub-array unit with the size of (M−2k)×N in the form of voltage according to the corresponding coordinate information,
the three-dimensional non-volatile memory array is configured to achieve in parallel a dot product operation between the convolution kernel units and different parts of the two-dimensional input data based on the non-volatile memory cell array and is configured to output in parallel a sum of dot product operation results of the convolution kernel units and the corresponding parts of the two-dimensional input data via the corresponding bit line electrode layer, so as to achieve the convolution operation between the convolution kernel and the two-dimensional input data,
wherein after being sequentially stored on the corresponding oblique line of the corresponding non-volatile memory cell array, each row of convolution kernel data of the convolution kernel units moves horizontally in a sliding direction of the convolution kernel onto the adjacent N−c oblique lines, the row of convolution kernel data is stored again, each layer of the non-volatile memory cell array stores the convolution kernel data of the corresponding row in the convolution kernel unit, and N−c copies of the convolution kernel data are implemented.