US 12,190,111 B2
Apparatus and method for vector packed dual complex-by-complex and dual complex-by-complex conjugate multiplication
Venkateswara Rao Madduri, Austin, TX (US); Robert Valentine, Kiryat Tivon (IL); and Mark J. Charney, Lexington, MA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jun. 26, 2021, as Appl. No. 17/359,561.
Prior Publication US 2023/0004390 A1, Jan. 5, 2023
Int. Cl. G06F 9/30 (2018.01); G06F 7/487 (2006.01); G06F 7/499 (2006.01); G06F 7/501 (2006.01)
CPC G06F 9/30036 (2013.01) [G06F 7/4876 (2013.01); G06F 7/49947 (2013.01); G06F 7/501 (2013.01); G06F 9/3001 (2013.01); G06F 9/30101 (2013.01); G06F 9/30145 (2013.01)] 25 Claims
OG exemplary drawing
 
1. A processor comprising:
a decoder to decode a first instruction to generate a decoded instruction, the first instruction including operands to identify a first plurality of complex numbers comprising a first plurality of packed real and imaginary data elements, a second plurality of complex numbers comprising a second plurality of packed real and imaginary data elements, and an immediate value;
a first source register to store the first plurality of complex numbers including the first plurality of packed real and imaginary data elements;
a second source register to store the second plurality of complex numbers including the second plurality of packed real and imaginary data elements;
execution circuitry to execute the decoded instruction, the execution circuitry comprising:
multiplier circuitry to multiply each of the first plurality of packed real data elements with a corresponding packed real data element of the second plurality to generate a first plurality of real products, and to multiply each of the first plurality of packed imaginary data elements with a corresponding packed imaginary data element of the second plurality to generate a second plurality of real products,
the multiplier circuitry further to multiply each of the first plurality of real data elements with a corresponding packed imaginary data element of the second plurality to generate a first plurality of imaginary products, and to multiply each of the first plurality of packed imaginary data elements with a corresponding packed real data element of the second plurality to generate a second plurality of imaginary products;
adder network to subtract, from each real product of the first plurality, a corresponding real product of the second plurality to generate a plurality of real results, and to add each imaginary product of the first plurality with a corresponding imaginary product of the second plurality to generate a plurality of imaginary results,
the adder network further to scale each of the plurality of real and imaginary results according to the immediate value to generate a plurality of scaled results, round each of the plurality of scaled real results according to a rounding method to generate a plurality of rounded results; and
saturation circuitry to saturate each of the plurality of rounded results to generate a plurality of final results;
and
a destination register to store each of the plurality of final results in a corresponding data element position of the destination register.