US 11,907,713 B2
Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
Kermin E. Chofleming, Hudson, MA (US); Chuanjun Zhang, Santa Clara, CA (US); Daniel Towner, Freshford Bath (GB); Simon C. Steely, Jr., Hudson, NH (US); and Benjamin Keen, Marlborough, MA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 28, 2019, as Appl. No. 16/729,369.
Prior Publication US 2021/0200540 A1, Jul. 1, 2021
Int. Cl. G06F 9/30 (2018.01); G06F 15/80 (2006.01)
CPC G06F 9/3001 (2013.01) [G06F 9/30181 (2013.01); G06F 15/80 (2013.01)] 24 Claims
OG exemplary drawing
 
1. An apparatus comprising:
a plurality of processing elements; and
a hardware network between the plurality of processing elements to transfer values between the plurality of processing elements,
wherein a processing element of the plurality of processing elements comprises:
a first plurality of input queues having a multiple bit width coupled to the hardware network,
at least one first output queue having the multiple bit width coupled to the hardware network,
a second one or more input queues coupled to the hardware network and having a second width that is narrower than the multiple bit width of the first plurality of input queues,
operation circuitry coupled to the first plurality of input queues having the multiple bit width,
a sign modification circuit coupled to the first plurality of input queues having the multiple bit width and the second one or more input queues having the second width, and
a configuration register within the processing element to store a configuration value comprising a sign modification field that causes the sign modification circuit to modify a sign bit of a value from the first plurality of input queues according to the sign modification field to create a sign modified value, and the configuration value causes the operation circuitry to perform a selected operation of a plurality of operations on a value from the first plurality of input queues and the sign modified value to create a resultant value, and store the resultant value in the at least one first output queue, wherein the sign modification field of the configuration register is selectable between storage of a conditional command value that causes the sign modification circuit to use a value stored, by another processing element, in the second one or more input queues to control the modification of the sign bit, and an unconditional command value that unconditionally controls the modification of the sign bit by the sign modification circuit.