US 12,287,745 B2
Direct memory access architecture with multi-level multi-striding
Mark William Gottscho, Mountain View, CA (US); Matthew William Ashcraft, San Carlos, CA (US); Thomas Norrie, Mountain View, CA (US); and Oliver Edward Bowen, Redwood City, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Aug. 2, 2023, as Appl. No. 18/229,616.
Application 18/229,616 is a continuation of application No. 17/728,478, filed on Apr. 25, 2022, granted, now 11,762,793.
Application 17/728,478 is a continuation of application No. 16/838,796, filed on Apr. 2, 2020, granted, now 11,314,674, issued on Apr. 26, 2022.
Claims priority of provisional application 62/977,062, filed on Feb. 14, 2020.
Prior Publication US 2024/0070098 A1, Feb. 29, 2024
Int. Cl. G06F 13/28 (2006.01); G06F 1/04 (2006.01)
CPC G06F 13/28 (2013.01) [G06F 1/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A direct memory access (DMA) system, comprising:
a DMA hardware unit comprising:
a request generator configured to generate multiple memory addresses for tensor elements of a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation for the multi-dimensional tensor; and
a progress tracker comprising:
a response reorder unit configured to maintain, for each tensor element, a status of whether a memory operation for the tensor element has been performed; and
a synchronization unit configured to provide, to a processor core, multiple partial updates that each specify an overall status of memory operations performed on the tensor elements of the multi-dimensional tensor,
wherein the progress tracker is configured to determine that a number of received responses associated with the respective requests from the memory system is greater than or equal to a threshold value, and in response to determining that the number of received responses is greater than or equal to the threshold value, cause the synchronization unit to provide a partial update to the processor core.