US 12,229,570 B2
Block data load with transpose into memory
Bin He, Oviedo, FL (US); Michael John Mantor, Orlando, FL (US); Brian Emberling, Palo Alto, CA (US); Liang Huang, Pudong District (CN); and Chao Liu, Austin, TX (US)
Assigned to Advanced Micro Devices, Inc., Santa Clara, CA (US)
Filed by Advanced Micro Devices, Inc., Santa Clara, CA (US)
Filed on Sep. 25, 2022, as Appl. No. 17/952,270.
Prior Publication US 2024/0103879 A1, Mar. 28, 2024
Int. Cl. G06F 9/30 (2018.01); G06F 1/20 (2006.01); G06F 9/38 (2018.01); G06F 15/80 (2006.01)
CPC G06F 9/3887 (2013.01) [G06F 1/20 (2013.01); G06F 9/3001 (2013.01); G06F 9/30036 (2013.01); G06F 9/30043 (2013.01); G06F 9/30098 (2013.01); G06F 15/8007 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
responsive to receiving, by a control circuit of a processor, a single transpose and load instruction specifying a plurality of blocks of data to be processed in a transposed form by a processor array of the processor, fetching, by the control circuit of the processor, the data for processing by storing the transposed form of the data in at least one memory module circuit of the processor array without generating an intermediate representation of the transposed form of the data; and
processing, by the processor array, the transposed form of the data stored in the at least one memory module circuit.