US 11,966,766 B2
Reduction server for fast distributed training
Chang Lan, Kirkland, WA (US); and Soroush Radpour, Vancouver (CA)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 21, 2020, as Appl. No. 17/076,393.
Prior Publication US 2022/0121465 A1, Apr. 21, 2022
Int. Cl. G06F 9/455 (2018.01); G06N 3/08 (2023.01); G06N 3/084 (2023.01); G06N 3/098 (2023.01)
CPC G06F 9/45558 (2013.01) [G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06F 2009/45562 (2013.01); G06F 2009/4557 (2013.01); G06N 3/098 (2023.01)] 19 Claims
OG exemplary drawing
 
15. A data processing method for synchronous distributed training, comprising:
instantiating a first set of virtual machines running one or more worker processes, each worker process operating on a respective data block to each produce a respective gradient;
instantiating a second set of virtual machines that comprise a reduction server, the second set of virtual machines being stateless and running one or more reducer processes that operates on at least a portion of each respective gradient to generate an aggregated gradient;
configuring the second set of virtual machines such that computations or communications are balanced across the one or more reducer processes; and
broadcasting, by the reduction server, the aggregated gradient to the first set of virtual machines, and
wherein a first total bandwidth associated with use of the first set of virtual machines is equal to a second total bandwidth associated with use of the reduction server.