US 11,899,746 B2
Circuitry for high-bandwidth, low-latency machine learning
Martin Langhammer, Alderbury (GB); and Andrei-Mihai Hagiescu-Miriste, East York (CA)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 23, 2021, as Appl. No. 17/560,950.
Application 17/560,950 is a continuation of application No. 16/369,973, filed on Mar. 29, 2019, granted, now 11,216,532.
Claims priority of provisional application 62/736,981, filed on Sep. 26, 2018.
Prior Publication US 2022/0114236 A1, Apr. 14, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/16 (2006.01); G06N 5/00 (2023.01); G06N 3/063 (2023.01)
CPC G06F 17/16 (2013.01) [G06N 5/00 (2013.01); G06N 3/063 (2013.01)] 20 Claims
OG exemplary drawing
 
11. An integrated circuit device comprising:
a cascade chain implemented using a plurality of digital signal processing (DSP) blocks, wherein the cascade chain is configurable to determine a dot-product based at least in part on first data associated with a first matrix and second data associated with a second matrix, wherein the first data comprises a first vector and a second vector of the first matrix, wherein the second data comprises a third vector of the second matrix, wherein a first DSP block of the plurality of DSP blocks comprises:
input circuitry configurable to receive a first vector of a first matrix, a second vector of the first matrix, and a third vector of a second matrix;
a register communicatively coupled to the input circuitry, wherein the register is configurable to store at least a portion of the second vector;
multiplier circuitry comprising:
a first multiplier configurable to:
receive the first vector directly from the input circuitry and receive the second vector from the register; and
determine a first partial inner-product based at least in part on the first vector and the second vector; and
a second multiplier configurable to determine a second partial inner-product based at least in part on the second vector and the third vector; and
adder circuitry configurable to determine a first inner-product of a dot-product based at least in part on the first partial inner-product; and
an embedded memory communicatively coupled to an output of the cascade chain and configurable to:
receive the dot-product from the cascade chain; and
store the dot-product.