US 12,136,138 B2
Neural network training with acceleration
Shiyu Li, Durham, NC (US); Krishna T. Malladi, San Jose, CA (US); Andrew Chang, Los Altos, CA (US); and Yang Seok Ki, Palo Alto, CA (US)
Assigned to Samsung Electronics Co., Ltd., Yongin-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Feb. 11, 2022, as Appl. No. 17/670,044.
Claims priority of provisional application 63/278,799, filed on Nov. 12, 2021.
Claims priority of provisional application 63/278,381, filed on Nov. 11, 2021.
Prior Publication US 2023/0147472 A1, May 11, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 1/60 (2006.01); G06F 13/40 (2006.01); G06F 16/22 (2019.01); G06T 1/20 (2006.01)
CPC G06T 1/60 (2013.01) [G06F 13/4022 (2013.01); G06F 16/2237 (2019.01); G06F 16/2282 (2019.01); G06T 1/20 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a graphics processing unit cluster; and
a computational storage cluster connected to the graphics processing unit cluster by a cache-coherent system interconnect,
wherein:
the graphics processing unit cluster comprises one or more graphics processing units,
the computational storage cluster comprises one or more computational storage devices, and
a first computational storage device of the one or more computational storage devices is configured to:
store an embedding table;
receive, from an interface associated with the cache-coherent system interconnect, an index vector comprising a first index and a second index; and
calculate an embedded vector based on:
a first row of the embedding table, corresponding to the first index, and
a second row of the embedding table, corresponding to the second index.