US 12,135,681 B2
Coprocessors with bypass optimization, variable grid architecture, and fused vector operations
Aditya Kesiraju, Los Gatos, CA (US); Andrew J. Beaumont-Smith, Cambridge, MA (US); Boris S. Alvarez-Heredia, Redwood City, CA (US); and Ran A. Chachick, Providence, RI (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Jul. 20, 2022, as Appl. No. 17/869,617.
Application 17/869,617 is a division of application No. 16/286,170, filed on Feb. 26, 2019, granted, now 11,429,555.
Prior Publication US 2022/0350776 A1, Nov. 3, 2022
Int. Cl. G06F 15/80 (2006.01); G06F 7/544 (2006.01); G06F 7/57 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 15/8053 (2013.01) [G06F 7/5443 (2013.01); G06F 7/57 (2013.01); G06F 9/3818 (2013.01); G06F 9/3828 (2013.01); G06F 9/3877 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A coprocessor comprising:
a plurality of processing element circuits arranged in a grid of rows and columns, wherein a given processing element circuit of the plurality of processing element circuits comprises an arithmetic-logic unit (ALU) circuit configured to perform one or more ALU operations on a plurality of input operands to generate a result; and
a queue circuit coupled to the plurality of processing element circuits and including a scheduler circuit configured to issue instruction operations to the plurality of processing element circuits, wherein a first given instruction operation is of either a matrix mode type that causes computations in multiple rows of the grid or a vector mode type that causes computations in a first row of the grid, and wherein the scheduler circuit is configured to concurrently issue, as fused instruction operations, a second given instruction operation with the first given instruction operation based on the first given instruction operation being of the vector mode type and further based on the second given instruction operation being of the vector mode type and using a second row of the grid different from the first row, wherein the first row and the second row are determined based on respective destination identifiers of the first given instruction operation and the second given instruction operation.