US 12,353,320 B2
Low latency offloading of collectives over a switch
Kishore Punniyamurthy, Austin, TX (US); Richard David Sodke, Kelowna (CA); Furkan Eris, Sunnyvale, CA (US); Sergey Blagodurov, Bellevue, WA (US); Bradford Michael Beckmann, Kirkland, WA (US); Brandon Keith Potter, Troup, TX (US); and Khaled Hamidouche, Austin, TX (US)
Assigned to Advanced Micro Devices, Inc; and ATI Technologies ULC, Markham (CA)
Filed by Advanced Micro Devices, Inc, Santa Clara, CA (US); and ATI Technologies ULC, Markham (CA)
Filed on Aug. 31, 2023, as Appl. No. 18/240,640.
Prior Publication US 2025/0077409 A1, Mar. 6, 2025
Int. Cl. G06F 12/02 (2006.01); G06F 13/16 (2006.01); G06F 13/28 (2006.01); G06F 13/40 (2006.01)
CPC G06F 12/023 (2013.01) [G06F 13/1673 (2013.01); G06F 13/28 (2013.01); G06F 13/4022 (2013.01); G06F 13/404 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a plurality of processing elements (PEs), wherein a symmetric memory is allocated in each of the plurality of PEs; and
a switch connected to the plurality of PEs, wherein the switch is to:
receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset,
compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and
initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.