US 12,242,564 B2
Method and system for performing multi-device based inference for large language model
Seongmin Hong, Hwaseong-si (KR); Junsoo Kim, Hwaseong-si (KR); and Gyubin Choi, Hwaseong-si (KR)
Assigned to HyperAccel Co., Ltd., Hwaseong-si (KR)
Filed by HyperAccel Co., Ltd., Hwaseong-si (KR)
Filed on Jun. 11, 2024, as Appl. No. 18/740,201.
Claims priority of application No. 10-2023-0074834 (KR), filed on Jun. 12, 2023.
Prior Publication US 2024/0411835 A1, Dec. 12, 2024
Int. Cl. G06F 17/16 (2006.01)
CPC G06F 17/16 (2013.01) 12 Claims
OG exemplary drawing
 
1. A device included in a multi-device-based inference performance system, the device comprising:
a matrix processing unit configured to perform matrix multiplication on data;
a sub-result storage configured to store, in a memory, a first sub-result that is calculated in real time by the matrix processing unit while the matrix multiplication is being performed;
a transmitter configured to read the first sub-result stored in the memory and to transmit the same to at least one another device included in the inference performance system while the matrix multiplication is being performed;
a receiver configured to receive a second sub-result calculated by each of the at least one other device from the at least one other device and to store the same in the memory; and
a synchronizer configured to synchronize data using the first sub-result and the second sub-result while the matrix multiplication is being performed,
wherein performance of the matrix multiplication, transmission of the first sub-result, and reception of the second sub-result is implemented to be simultaneously performed.