US 11,687,832 B1
Training a model using parameter server shards
Gregory S. Corrado, San Francisco, CA (US); Kai Chen, San Bruno, CA (US); Jeffrey A. Dean, Palo Alto, CA (US); Samy Bengio, Los Altos, CA (US); Rajat Monga, Sunnyvale, CA (US); and Matthieu Devin, Los Altos Hills, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Aug. 3, 2020, as Appl. No. 16/983,979.
Application 16/983,979 is a continuation of application No. 15/665,236, filed on Jul. 31, 2017, granted, now 10,733,535.
Application 15/665,236 is a continuation of application No. 15/231,534, filed on Aug. 8, 2016, granted, now 9,721,214, issued on Aug. 1, 2017.
Application 15/231,534 is a continuation of application No. 14/817,745, filed on Aug. 4, 2015, granted, now 9,412,065, issued on Aug. 9, 2016.
Application 14/817,745 is a continuation of application No. 13/826,327, filed on Mar. 14, 2013, granted, now 9,218,573, issued on Dec. 22, 2015.
Claims priority of provisional application 61/664,133, filed on Jun. 25, 2012.
Claims priority of provisional application 61/654,806, filed on Jun. 1, 2012.
Claims priority of provisional application 61/650,427, filed on May 22, 2012.
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 20/00 (2019.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01); G06N 7/08 (2006.01); G06N 5/025 (2023.01); G06F 18/214 (2023.01); G06F 18/2411 (2023.01); G06N 7/01 (2023.01)
CPC G06N 20/00 (2019.01) [G06N 3/063 (2013.01); G06N 3/08 (2013.01); G06N 7/08 (2013.01); G06F 18/214 (2023.01); G06F 18/2411 (2023.01); G06N 5/025 (2013.01); G06N 7/01 (2023.01)] 16 Claims
OG exemplary drawing
 
1. A system for training a machine learning model having parameters by determining a respective trained parameter value for each of the parameters of the machine learning model, the system comprising:
one or more server computing units; and
a parameter server executing on the one or more server computing units, wherein the parameter server is configured to maintain and asynchronously update values of each of the parameters of the machine learning model based on delta values received from a plurality of model replicas, wherein each model replica executes on a respective replica computing unit, wherein each of the plurality of model replicas is configured to maintain an identical instance of the machine learning model with possibly different parameter values for the parameters of the machine learning model and to operate independently of each other model replica, and wherein each model replica is further configured to asynchronously request parameter values from the parameter server, determine delta values for the parameters based on stochastic gradient descent, and provide the delta values to the parameter server.