US 12,229,215 B2
Performing matrix multiplication in a streaming processor
Yun Du, San Diego, CA (US); Gang Zhong, San Diego, CA (US); Fei Wei, San Diego, CA (US); Yibin Zhang, San Diego, CA (US); Jing Han, San Jose, CA (US); Hongjiang Shang, San Diego, CA (US); Elina Kamenetskaya, Belmont, MA (US); Minjie Huang, San Diego, CA (US); Alexei Vladimirovich Bourd, San Diego, CA (US); Chun Yu, Rancho Santa Fe, CA (US); Andrew Evan Gruber, Arlington, MA (US); and Eric Demers, San Diego, CA (US)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Oct. 16, 2023, as Appl. No. 18/487,918.
Application 18/487,918 is a continuation of application No. 17/137,226, filed on Dec. 29, 2020, granted, now 11,829,439.
Claims priority of provisional application 62/955,311, filed on Dec. 30, 2019.
Prior Publication US 2024/0037183 A1, Feb. 1, 2024
Int. Cl. G06F 17/16 (2006.01); G06F 7/57 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 17/16 (2013.01) [G06F 7/57 (2013.01); G06F 9/30036 (2013.01); G06F 9/3851 (2013.01); G06F 9/3887 (2013.01)] 18 Claims
OG exemplary drawing
 
1. An apparatus for data processing, comprising:
at least one memory; and
at least one processor coupled to the at least one memory and configured to:
fetch an element of an input matrix from graphics memory;
determine whether the element of the input matrix is to be used across multiple threads; and
store the element of the input matrix at a buffer until a workgroup corresponding to at least one of the multiple threads is executed in response to a first determination that the element of the input matrix is to be used across multiple threads, or store the element of the input matrix at a general purpose register (GPR) in response to a second determination that the element of the input matrix is not to be used across multiple threads.