US 11,755,323 B2
Apparatus and method for complex by complex conjugate multiplication
Venkateswara Madduri, Austin, TX (US); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Jesus Corbal, King City, OR (US); Mark Charney, Lexington, MA (US); Robert Valentine, Kiryat Tivon (IL); and Binwei Yang, Austin, TX (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Feb. 15, 2022, as Appl. No. 17/672,504.
Application 17/672,504 is a continuation of application No. 15/721,448, filed on Sep. 29, 2017, granted, now 11,256,504.
Prior Publication US 2022/0171624 A1, Jun. 2, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01)
CPC G06F 9/30036 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30105 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A processor comprising:
a hardware decoder to decode a first instruction to generate a decoded instruction;
a first source register to store a first plurality of packed real and imaginary data elements comprising a first plurality of complex numbers;
a second source register to store a second plurality of packed real and imaginary data elements comprising a second plurality of complex numbers, wherein each of the second plurality of complex numbers comprises a complex conjugate of a corresponding complex number of the first plurality of complex numbers; and
execution circuitry to execute the decoded instruction, the execution circuitry comprising:
at least one hardware multiplier to multiply selected real and imaginary data elements in the first source register and the second source register, wherein the at least one hardware multiplier is to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products;
at least one hardware adder to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and
accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and the second final result back in the destination register.