| CPC G06N 3/063 (2013.01) [G06N 3/084 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01); G11C 11/34 (2013.01)] | 20 Claims |

|
1. A method comprising:
during a single burst cycle, receiving gradients from each of a plurality of workers, into a predetermined number of gradient buffers associated with a gradient optimizer;
during the single burst cycle, providing the received gradients to a reduction block to generate reduced gradients;
during the single burst cycle, providing the reduced gradients to a gradient optimizer data path associated with the gradient optimizer;
during the single burst cycle, moving weights from at least one buffer, coupled to a memory, to the gradient optimizer;
during the single burst cycle writing back new weights, calculated by the gradient optimizer, to the memory; and
during the single burst cycle transmitting the new weights from the gradient optimizer to each of the plurality of workers, wherein during the single burst cycle the gradient optimizer operates on a gradient burst having a burst size equal to a fixed number of gradients.
|