US 12,079,628 B2
Apparatus and method for loop flattening and reduction in a single instruction multiple data (SIMD) pipeline
William M. Brown, Beaverton, OR (US); Roland Schulz, Portland, OR (US); and Karthik Raman, Beaverton, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Oct. 4, 2021, as Appl. No. 17/493,667.
Application 17/493,667 is a continuation of application No. 16/554,169, filed on Aug. 28, 2019, granted, now 11,138,008.
Application 16/554,169 is a continuation of application No. 15/859,046, filed on Dec. 29, 2017, granted, now 10,409,601, issued on Sep. 10, 2019.
Prior Publication US 2022/0100509 A1, Mar. 31, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 9/30032 (2013.01) [G06F 9/30018 (2013.01); G06F 9/30036 (2013.01); G06F 9/30098 (2013.01); G06F 9/30145 (2013.01); G06F 9/3887 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A processor comprising:
a decoder configured to decode a move instruction to generate a decoded move instruction identifying a plurality of operations, the move instruction including an opcode, and first and second source operands, the move instruction having a split value associated therewith;
a first source register associated with the first source operand to store a first plurality of packed data elements;
a second source register associated with the second source operand to store a second plurality of packed data elements; and
execution circuitry configured to execute the operations of the decoded move instruction, the execution circuitry configured to select a first set of contiguous data elements from the first source register to generate a first result and configured to select a second set of contiguous data elements from the second source register to generate a second result and to store the first and second results in first and second locations of a destination vector register,
wherein the execution circuitry is configured to determine the first set of contiguous data elements and the second set of contiguous data elements in accordance with the split value associated with the move instruction.