| CPC G06F 9/30036 (2013.01) [G06F 9/30038 (2023.08); G06F 9/3012 (2013.01)] | 20 Claims |

|
1. An integrated circuit comprising:
a vector register file configured to store register values of an instruction set architecture;
an input data operand buffer;
an intermediate result operand buffer;
a completion flags buffer; and
a vector permutation circuitry configured to, responsive to a vector permutation instruction identifying a vector of indices stored in the vector register file, a vector of source data stored in the vector register file, and a destination vector to be stored in the vector register file:
read the vector of source data into the input data operand buffer;
expand an element index from the vector of indices to obtain byte indices for respective bytes of a corresponding element;
store the byte indices in corresponding bytes of the intermediate result operand buffer;
update bits in the completion flags buffer corresponding to the bytes of the intermediate result operand buffer storing the byte indices to indicate that the corresponding bytes store indices;
identify bytes of the input data operand buffer storing an element of the vector of source data pointed to by the element index based on the byte indices stored in the intermediate result operand buffer;
overwrite the byte indices in the intermediate result operand buffer with the identified bytes from the input data operand buffer; and
responsive to overwriting the byte indices in the intermediate result operand buffer with the identified bytes from the input data operand buffer, update the corresponding bits in the completion flags buffer to indicate that the corresponding bytes store data to be written to the destination vector.
|