US 11,940,907 B2
Methods and apparatus for sparse tensor storage for neural network accelerators
Martin-Thomas Grymel, Leixlip (IE); David Bernard, Kilcullen (IE); Niall Hanrahan, Corrandulla (IE); Martin Power, Dublin (IE); Kevin Brady, Newry (GB); Gary Baugh, Bray (IE); and Cormac Brick, San Francisco, CA (US)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 25, 2021, as Appl. No. 17/359,217.
Prior Publication US 2021/0406164 A1, Dec. 30, 2021
Int. Cl. G06F 12/00 (2006.01); G06F 12/02 (2006.01); G06N 3/10 (2006.01)
CPC G06F 12/0207 (2013.01) [G06F 12/0292 (2013.01); G06N 3/10 (2013.01)] 24 Claims
OG exemplary drawing
 
1. An apparatus comprising:
sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero;
static storage controlling circuitry to:
divide the tensor into one or more storage elements; and
reallocate an amount of memory for the tensor based on one or more storage elements exceeding a threshold sparsity;
rotation controlling circuitry to rotate the one or more storage elements to a target orientation based on a dimensionality of the tensor and a scaling factor including a number of bits in the data point;
compressor circuitry to:
perform a first compression of the one or more storage elements to generate one or more first compressed storage elements, the first compression to remove data points of the one or more storage elements based on the sparsity map; and
perform a second compression of the one or more first compressed storage elements, the second compression to store the one or more first compressed storage elements contiguously in memory of a first compute unit; and
post processing circuitry to:
determine a data point of a workload of the one or more second compressed storage elements to transmit;
determine a border region of the workload to transmit based on whether a data point of the workload is located within a border width and border height of the one or more second compressed storage elements; and
replicate the border region to a memory of a second compute unit.