US 12,353,984 B2
Hardware-assisted gradient optimization using streamed gradients
Jinwen Xi, Sunnyvale, CA (US); Bharadwaj Pudipeddi, San Jose, CA (US); and Marc Tremblay, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on May 3, 2023, as Appl. No. 18/311,670.
Application 18/311,670 is a continuation of application No. 16/827,367, filed on Mar. 23, 2020, granted, now 11,681,905.
Prior Publication US 2023/0274130 A1, Aug. 31, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/06 (2006.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01); G06N 5/046 (2023.01); G06N 20/00 (2019.01); G11C 11/34 (2006.01)
CPC G06N 3/063 (2013.01) [G06N 3/084 (2013.01); G06N 5/046 (2013.01); G06N 20/00 (2019.01); G11C 11/34 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
during a single burst cycle, receiving gradients from each of a plurality of workers, into a predetermined number of gradient buffers associated with a gradient optimizer;
during the single burst cycle, providing the received gradients to a reduction block to generate reduced gradients;
during the single burst cycle, providing the reduced gradients to a gradient optimizer data path associated with the gradient optimizer;
during the single burst cycle, moving weights from at least one buffer, coupled to a memory, to the gradient optimizer;
during the single burst cycle writing back new weights, calculated by the gradient optimizer, to the memory; and
during the single burst cycle transmitting the new weights from the gradient optimizer to each of the plurality of workers, wherein during the single burst cycle the gradient optimizer operates on a gradient burst having a burst size equal to a fixed number of gradients.