US 11,868,817 B2
Load balancing method, apparatus and device for parallel model training task, and storage medium
Li Wang, Shandong (CN); Kai Gao, Shandong (CN); Fang Cao, Shandong (CN); and Zhenhua Guo, Shandong (CN)
Assigned to INSPUR ELECTRONIC INFORMATION INDUSTRY CO., LTD., Shandong (CN)
Appl. No. 18/010,725
Filed by INSPUR ELECTRONIC INFORMATION INDUSTRY CO., LTD., Shandong (CN)
PCT Filed Feb. 20, 2021, PCT No. PCT/CN2021/076963
§ 371(c)(1), (2) Date Dec. 15, 2022,
PCT Pub. No. WO2022/001134, PCT Pub. Date Jan. 6, 2022.
Claims priority of application No. 202010597645.3 (CN), filed on Jun. 28, 2020.
Prior Publication US 2023/0195537 A1, Jun. 22, 2023
Int. Cl. G06F 9/50 (2006.01)
CPC G06F 9/5083 (2013.01) [G06F 2209/5022 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A load balancing method for a parallel model training task, comprising:
acquiring data traffic and a theoretical computational amount of each of a plurality of network layers in a target model, wherein the theoretical computational amount is a theoretical total number of computing resources required by training of a network layer;
determining a theoretical computing capability of each of a plurality of computing devices, and obtaining an initial computational amount corresponding to each of the plurality of computing devices according to the theoretical computing capability and the theoretical computational amount respectively, where the theoretical computing capability represents a computing speed of a computing device;
performing a load balancing operation according to the initial computational amount by using a multiple device critical layer position division rule, so as to obtain a plurality of initial balancing schemes, wherein the performing a load balancing operation according to the initial computational amount comprises:
dividing the plurality of network layers to each of the plurality of computing devices according to the initial computational amount in network layer order, and detecting a device critical layer;
in response to detecting the device critical layer:
dividing the device critical layer to a preceding computing device, so as to obtain a first balancing scheme, wherein the preceding computing device is a computing device that a preceding network layer corresponding to the device critical layer belongs to;
dividing the device critical layer to a subsequent computing device, so as to obtain a second balancing scheme, wherein the subsequent computing device is a computing device that a subsequent network layer corresponding to the device critical layer belongs to; and
determining the first balancing scheme and the second balancing scheme as the plurality of initial balancing schemes;
compiling statistics on time performance parameters corresponding to each of the plurality of initial balancing schemes respectively, and determining an intermediate balancing scheme from the plurality of initial balancing schemes according to the respective time performance parameters,
wherein the compiling statistics further comprises:
compiling statistics on computing time corresponding to each of the plurality of computing devices in each of the plurality of initial balancing schemes, and calculating a time average and a time standard deviation corresponding to each of the plurality of initial balancing schemes respectively by use of the respective computing time, so as to obtain the time performance parameter;
determining whether the time average is less than a first threshold and whether the time standard deviation is less than a second threshold;
determining one or more from the plurality of initial balancing schemes for which the time average is less than the first threshold and the time standard deviation is less than the second threshold as one or more candidate balancing schemes;
when there is one candidate balancing scheme, determining the one candidate balancing scheme as the intermediate balancing scheme; and
when there are a plurality of candidate balancing schemes, selecting one candidate balancing scheme as the intermediate balancing scheme from the plurality of candidate balancing schemes according to a preset selection rule;
adjusting the intermediate balancing scheme according to the data traffic, to obtain a final balancing scheme;
splitting the target model according to the final balancing scheme, to obtain a plurality of network layer groups, and send each network layer group to the corresponding computing device of the plurality of computing devices; and
training, each network layer group, by the corresponding device of the plurality of computing devices.