US 12,190,232 B2
Asychronous training of machine learning model
Taifeng Wang, Redmond, WA (US); Wei Chen, Redmond, WA (US); Tie-Yan Liu, Redmond, WA (US); Fei Gao, Redmond, WA (US); and Qiwei Ye, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Appl. No. 16/327,679
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
PCT Filed Aug. 17, 2017, PCT No. PCT/US2017/047247
§ 371(c)(1), (2) Date Feb. 22, 2019,
PCT Pub. No. WO2018/039011, PCT Pub. Date Mar. 1, 2018.
Claims priority of application No. 201610730381.8 (CN), filed on Aug. 25, 2016.
Prior Publication US 2019/0197404 A1, Jun. 27, 2019
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, by a computing device from a worker implemented by a computer processing unit, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker;
determining, by the computing device, differences between the previous values and current values of the set of parameters;
calculating a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; and
updating the current values based on the zero-order term and the first-order term to obtain updated values of the set of the parameters, wherein updating the current values based on the zero-order term and the first-order term:
comprises applying update amounts to the current values, the update amounts including a term that is a product of a delayed gradient and a learning rate; and
provides compensation for delay between a plurality of workers implemented by one or more computer processing units that each provide respective feedback data generated by training the machine learning model, the compensation for delay reducing mismatch between the plurality of workers and enabling efficient asynchronous training of the machine learning model.