US 12,147,810 B2
Processor, operation method, and load-store device for implementation of accessing vector strided memory
Chia-Wei Hsu, Tainan (TW)
Assigned to ANDES TECHNOLOGY CORPORATION, Hsinchu (TW)
Filed by ANDES TECHNOLOGY CORPORATION, Hsinchu (TW)
Filed on Jun. 22, 2022, as Appl. No. 17/846,030.
Prior Publication US 2023/0418614 A1, Dec. 28, 2023
Int. Cl. G06F 9/34 (2018.01); G06F 9/30 (2018.01); G06F 9/345 (2018.01)
CPC G06F 9/3455 (2013.01) [G06F 9/30036 (2013.01); G06F 9/30043 (2013.01); G06F 9/3012 (2013.01)] 27 Claims
OG exemplary drawing
 
1. A processor adapted to access a memory and comprising:
a vector register file; and
a load-store logic circuit coupled to the vector register file and configured to perform a first strided operation and a second strided operation on the memory, wherein the load-store logic circuit reads a plurality of first data elements at a first plurality of discrete addresses in the memory and writes the first data elements into the vector register file in a current iteration of the first strided operation, and the load-store logic circuit reads a plurality of second data elements from the vector register file and respectively writes the second data elements into a second plurality of discrete addresses in the memory in a current iteration of the second strided operation,
wherein the load-store logic circuit comprises:
a strided address logic circuit generating a plurality of strided addresses based on a least significant bits part of a current base address and a stride, wherein each of the plurality of strided addresses has a carry part and an offset part, the strided address logic circuit calculates {Cn,OFFn}=(LSB1+LSB2*(n−1)) to generate an n-th strided address {Cn,OFFn} among N of the plurality of strided addresses, the N of the plurality of strided addresses are used for the current iteration of the first strided operation or the second strided operation, Nis an integer, n is an integer greater than 0 and less than or equal to N, LSB2 is a least significant bits part of the stride, LSB1 is a least significant bits part of the current base address, OFFn is the offset part of the n-th strided address, and Cn is the carry part of the n-th strided address; and
a load-store circuit coupled to the strided address logic circuit to receive the plurality of strided addresses, wherein the load-store circuit comprises:
a first line buffer configured to read a plurality of bytes at the first plurality of discrete addresses from the memory based on a most significant bits part of the current base address in the current iteration of the first strided operation, wherein the bytes comprise the first data elements;
a control circuit coupled to the strided address logic circuit to receive the plurality of strided addresses, wherein the control circuit selects at least one of the offset parts of the plurality of strided addresses based on a data element length to generate N offset values, and the control circuit rotates the offset values based on a write pointer to generate N multiplexer select signals; and
a load circuit coupled to the control circuit to receive the multiplexer select signals and configured to collect the first data elements from the bytes of the first line buffer based on the multiplexer select signals.