CPC G06F 13/28 (2013.01) [G06F 13/4027 (2013.01)] | 20 Claims |
1. A computing system for accelerating machine-learning computations comprising:
a direct memory access component; and
one or more controllers that are communicably connected to the direct memory access component via one or more system buses, wherein each of the one or more controllers is configured to:
receive, from the direct memory access component, a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size;
determine that a computation is to be performed with data including the data chunk based on the token;
generate, in response to the determination, one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, wherein:
when a generated address is greater than a pre-determined maximum associated with the first circular buffer, the generated address is subtracted by the pre-determined size of the first circular buffer; and
when the generated address is less than a pre-determined minimum associated with the first circular buffer, the generated address is added by the pre-determined size of the first circular buffer;
wherein a value of a token counter associated with the first circular buffer is initialized to −k when the direct memory access component determines that k data chunks need to be loaded to the first circular buffer without causing a computation.
|