US 12,260,214 B1
	Throughput increase for compute engine
Paul Gilbert Meyer, Jericho, VT (US); Ron Diamant, San Jose, CA (US); Sundeep Amirineni, Cedar Park, TX (US); and Sunil Kumar Bathula, Austin, TX (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 30, 2022, as Appl. No. 17/937,332.
Int. Cl. G06F 9/38 (2018.01); G06F 9/30 (2018.01); G06F 13/16 (2006.01)

CPC G06F 9/30036 (2013.01) [G06F 13/1668 (2013.01)]

21 Claims

1. A method comprising:

receiving an instruction to perform a computation on an input tensor stored in memory to generate an output tensor using a compute channel having multiple computational circuit blocks coupled in series in a pipeline;

determining that the computation uses half or less of the computational circuit blocks in the pipeline;

determining a datatype of data elements of the input tensor;

selecting a throughput increase mode from a plurality of throughput increase modes based on the datatype and a number of computational circuit blocks used by the computation, the plurality of throughput increase modes including:

a single port partial bitlength datatype mode that operates on a partial bitlength datatype and accesses concatenated data elements from the memory using a single port;

a multiport mode that operates on a full bitlength datatype or the partial bitlength datatype, and accesses data elements from the memory using multiple ports; and

a multiport partial bitlength datatype mode that operates on the partial bitlength datatype and accesses concatenated data elements from the memory using the multiple ports;

splitting the input tensor into a plurality of input data streams; and

performing the computation on the plurality of input data streams in respective subsets of one or more computational circuit blocks to generate a plurality of output data streams corresponding to the output tensor in accordance with the selected throughput increase mode.