US 12,405,787 B2
Utilizing structured sparsity in systolic arrays
Subramaniam Maiyuran, Gold River, CA (US); Jorge Parra, El Dorado Hills, CA (US); Ashutosh Garg, Folsom, CA (US); Chandra Gurram, Folsom, CA (US); Chunhui Mei, San Diego, CA (US); Durgesh Borkar, Folsom, CA (US); Shubra Marwaha, Folsom, CA (US); Supratim Pal, Folsom, CA (US); Varghese George, Folsom, CA (US); Wei Xiong, Fremont, CA (US); Yan Li, San Diego, CA (US); Yongsheng Liu, San Diego, CA (US); Dipankar Das, Pune (IN); Sasikanth Avancha, Bangalore (IN); Dharma Teja Vooturi, Jagtial (IN); and Naveen K. Mellempudi, Bangalore (IN)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Mar. 29, 2024, as Appl. No. 18/621,539.
Application 18/621,539 is a continuation of application No. 17/107,823, filed on Nov. 30, 2020, granted, now 11,977,885.
Prior Publication US 2024/0320000 A1, Sep. 26, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 15/80 (2006.01)
CPC G06F 9/30036 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30101 (2013.01); G06F 9/3893 (2013.01); G06F 15/8046 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a processor comprising a systolic array to:
execute an instruction for sparse systolic dot product accumulate;
read at least portions of elements of a plurality of source registers referenced by the instruction, wherein the plurality of source registers comprise a first source register having metadata corresponding to structured source data, a second source register having unpacked source data, and a third source register having the structured source data packed based on sparsity as packed source data;
provide a first subset of elements of the packed source data to at least one stage of the systolic array, the at least one stage comprising dot product circuitry;
select, using the metadata, a second subset of elements of the unpacked source data to utilize the at least one stage of the systolic array, the second subset of elements corresponding to the first subset of elements; and
perform, at the at least one stage of the systolic array, dot product accumulate operations.