US 12,136,138 B2
	Neural network training with acceleration
Shiyu Li, Durham, NC (US); Krishna T. Malladi, San Jose, CA (US); Andrew Chang, Los Altos, CA (US); and Yang Seok Ki, Palo Alto, CA (US)
Assigned to Samsung Electronics Co., Ltd., Yongin-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Feb. 11, 2022, as Appl. No. 17/670,044.
Claims priority of provisional application 63/278,799, filed on Nov. 12, 2021.
Claims priority of provisional application 63/278,381, filed on Nov. 11, 2021.
Prior Publication US 2023/0147472 A1, May 11, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 1/60 (2006.01); G06F 13/40 (2006.01); G06F 16/22 (2019.01); G06T 1/20 (2006.01)

CPC G06T 1/60 (2013.01) [G06F 13/4022 (2013.01); G06F 16/2237 (2019.01); G06F 16/2282 (2019.01); G06T 1/20 (2013.01)]

20 Claims

1. A system, comprising:

a graphics processing unit cluster; and

a computational storage cluster connected to the graphics processing unit cluster by a cache-coherent system interconnect,

wherein:

the graphics processing unit cluster comprises one or more graphics processing units,

the computational storage cluster comprises one or more computational storage devices, and

a first computational storage device of the one or more computational storage devices is configured to:

store an embedding table;

receive, from an interface associated with the cache-coherent system interconnect, an index vector comprising a first index and a second index; and

calculate an embedded vector based on:

a first row of the embedding table, corresponding to the first index, and

a second row of the embedding table, corresponding to the second index.