US 12,190,158 B2
Using sparsity metadata to reduce systolic array power consumption
Jorge Parra, El Dorado Hills, CA (US); Supratim Pal, Folsom, CA (US); Jiasheng Chen, El Dorado Hills, CA (US); and Chandra Gurram, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 25, 2021, as Appl. No. 17/358,542.
Prior Publication US 2022/0413924 A1, Dec. 29, 2022
Int. Cl. G06F 7/50 (2006.01); G06F 1/329 (2019.01); G06F 7/523 (2006.01); G06F 7/544 (2006.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06F 15/80 (2006.01); G06F 17/16 (2006.01); G06T 1/20 (2006.01)
CPC G06F 9/5027 (2013.01) [G06F 7/50 (2013.01); G06F 7/523 (2013.01); G06F 9/5094 (2013.01); G06F 15/8046 (2013.01); G06T 1/20 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A processing apparatus including:
a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements associated with multiple processing channels, wherein the multiple processing elements are configured to:
receive output sparsity metadata at a first pipeline stage, the output sparsity metadata associated with the multiple processing channels, wherein the output sparsity metadata is independent of input sparsity of input matrix elements;
perform processing operations on the input matrix elements based on the output sparsity metadata, wherein to perform the processing operations includes to:
bypass multiplication at a first processing element associated with a first processing channel and power gate a portion of the first processing element; and
multiply input elements at a second processing element associated with a second processing channel.