CPC G06F 9/45558 (2013.01) [G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06F 2009/45562 (2013.01); G06F 2009/4557 (2013.01); G06N 3/098 (2023.01)] | 19 Claims |
15. A data processing method for synchronous distributed training, comprising:
instantiating a first set of virtual machines running one or more worker processes, each worker process operating on a respective data block to each produce a respective gradient;
instantiating a second set of virtual machines that comprise a reduction server, the second set of virtual machines being stateless and running one or more reducer processes that operates on at least a portion of each respective gradient to generate an aggregated gradient;
configuring the second set of virtual machines such that computations or communications are balanced across the one or more reducer processes; and
broadcasting, by the reduction server, the aggregated gradient to the first set of virtual machines, and
wherein a first total bandwidth associated with use of the first set of virtual machines is equal to a second total bandwidth associated with use of the reduction server.
|