US 12,182,577 B2
Neural-processing unit tile for shuffling queued nibbles for multiplication with non-zero weight nibbles
Ilia Ovsiannikov, Porter Ranch, CA (US); Ali Shafiee Ardestani, San Jose, CA (US); Hamzah Ahmed Ali Abdelaziz, San Jose, CA (US); and Joseph H. Hassoun, Los Gatos, CA (US)
Assigned to Samsung Electronics Co., Ltd., Yongin-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Apr. 13, 2020, as Appl. No. 16/847,504.
Claims priority of provisional application 62/841,606, filed on May 1, 2019.
Prior Publication US 2020/0349106 A1, Nov. 5, 2020
Int. Cl. G06F 9/38 (2018.01); G06F 15/80 (2006.01)
CPC G06F 9/3885 (2013.01) [G06F 15/80 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A processor, comprising:
a first tile,
a second tile,
a memory, and
a bus,
the bus being connected to:
the memory,
the first tile, and
the second tile,
the first tile comprising:
a first weight register storing a first weight, the first weight being zero;
a second weight register storing a third weight, the third weight being an eight-bit number neither nibble of which is zero;
a third weight register storing a second weight, the second weight being zero;
a fourth weight register storing a fourth weight, the fourth weight being an eight-bit number neither nibble of which is zero;
an activations buffer;
a first shuffler;
a first multiplier connected to the first weight register;
a second multiplier connected to the second weight register;
a third multiplier connected to the third weight register; and
a fourth multiplier connected to the fourth weight register, the activations buffer being configured to include:
a first queue,
a second queue,
a third queue, and
a fourth queue,
the first tile being configured:
to feed a first nibble from the third queue, through the first shuffler, to the first multiplier, and to multiply, in the first multiplier, the first nibble from the third queue by a first nibble of the third weight;
to feed a second nibble from the third queue, through the first shuffler, to the second multiplier, and to multiply, in the second multiplier, the second nibble from the third queue by a second nibble of the third weight;
to feed a first nibble from the fourth queue, through the first shuffler, to the third multiplier, and to multiply, in the third multiplier, the first nibble from the fourth queue by a first nibble of the fourth weight; and
to feed a second nibble from the fourth queue, through the first shuffler, to the fourth multiplier, and to multiply, in the fourth multiplier, the second nibble from the fourth queue by a second nibble of the fourth weight.