CPC G06F 9/30036 (2013.01) [G06F 13/1668 (2013.01)] | 21 Claims |
1. A method comprising:
receiving an instruction to perform a computation on an input tensor stored in memory to generate an output tensor using a compute channel having multiple computational circuit blocks coupled in series in a pipeline;
determining that the computation uses half or less of the computational circuit blocks in the pipeline;
determining a datatype of data elements of the input tensor;
selecting a throughput increase mode from a plurality of throughput increase modes based on the datatype and a number of computational circuit blocks used by the computation, the plurality of throughput increase modes including:
a single port partial bitlength datatype mode that operates on a partial bitlength datatype and accesses concatenated data elements from the memory using a single port;
a multiport mode that operates on a full bitlength datatype or the partial bitlength datatype, and accesses data elements from the memory using multiple ports; and
a multiport partial bitlength datatype mode that operates on the partial bitlength datatype and accesses concatenated data elements from the memory using the multiple ports;
splitting the input tensor into a plurality of input data streams; and
performing the computation on the plurality of input data streams in respective subsets of one or more computational circuit blocks to generate a plurality of output data streams corresponding to the output tensor in accordance with the selected throughput increase mode.
|