US 12,455,746 B2
Method and apparatus for performing matrix multiplication
Shashi Kiran Chilappagari, San Jose, CA (US); and Winston Lee, Palo Alto, CA (US)
Assigned to DeGirum Corporation, Santa Clara, CA (US)
Filed by DeGirum Corporation, Santa Clara, CA (US)
Filed on Sep. 9, 2021, as Appl. No. 17/470,675.
Application 17/470,675 is a division of application No. 16/397,401, filed on Apr. 29, 2019, abandoned.
Prior Publication US 2021/0406030 A1, Dec. 30, 2021
Int. Cl. G06F 9/38 (2018.01); G06F 9/30 (2018.01); G06F 17/16 (2006.01)
CPC G06F 9/3887 (2013.01) [G06F 9/30109 (2013.01); G06F 9/30141 (2013.01); G06F 9/3824 (2013.01); G06F 17/16 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A method of performing matrix multiplication of a first matrix and a second matrix using a computer system including N single instruction multiple data (SIMD) engines and N corresponding output register sets, wherein N is a number equal to or greater than two, and wherein each of the output register sets includes a corresponding plurality of output registers, the method comprising:
identifying a plurality of non-zero entries included in the first matrix, wherein each of the identified plurality of non-zero entries has a corresponding column address and a corresponding row address within the first matrix;
for each of the identified plurality of non-zero entries, using the corresponding row address of the non-zero entry to identify a corresponding one of the N SIMD engines and a corresponding one of the N output register sets to process the non-zero entry;
sorting the identified plurality of non-zero entries to select N non-zero entries, each having a different identified corresponding one of the N output register sets;
routing each of the selected N non-zero entries to the identified corresponding one of the N SIMD engines to perform multiplication operations with entries of the second matrix, wherein each of the N SIMD engines generates a plurality of products;
for each of the selected N non-zero entries, using the corresponding row address of the non-zero entry to identify one of the plurality of output registers within the identified corresponding one of the N output register sets; and
for each of the selected N non-zero entries, using the identified corresponding one of the N SIMD engines to perform accumulate operations by accessing the identified one of the output registers within the identified corresponding one of the N output register sets.