US 12,436,798 B2
	Method, electronic device, and computer program product for distributed data processing
Jinpeng Liu, Shanghai (CN); Zijia Wang, WeiFang (CN); Zhen Jia, Shanghai (CN); and Jiacheng Ni, Shanghai (CN)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Aug. 9, 2022, as Appl. No. 17/884,118.
Claims priority of application No. 202210873507.2 (CN), filed on Jul. 22, 2022.
Prior Publication US 2024/0028384 A1, Jan. 25, 2024
Int. Cl. G06F 9/44 (2018.01); G06F 9/48 (2006.01); G06N 20/00 (2019.01)

CPC G06F 9/48 (2013.01) [G06N 20/00 (2019.01)]

20 Claims

1. A method for distributed data processing, comprising:

obtaining an input for a data processing task based on a multi-head attention mechanism, the data processing task comprising a first subtask and a second subtask, the first subtask corresponding to a first attention head in the multi-head attention mechanism, the first attention head being implemented using a first dedicated computing resource, and the second subtask corresponding to a second attention head in the multi-head attention mechanism, the second attention head being implemented using a second dedicated computing resource different than the first dedicated computing resource, wherein the first subtask and the second subtask are exclusively associated with the respective first attention head and second attention head implemented by the respective first dedicated computing resource and second dedicated computing resource, and wherein there is a one-to-one correspondence between the first attention head and the first dedicated computing resource and a one-to-one correspondence between the second attention head and the second dedicated computing resource;

transmitting the input to the first dedicated computing resource and the second dedicated computing resource, for performance of the respective first subtask and second subtask utilizing the respective first attention head and second attention head, the first dedicated computing resource corresponding to the first subtask, and the second dedicated computing resource corresponding to the second subtask; and

performing the first subtask and the second subtask on the input, utilizing the respective first attention head implemented by the first dedicated computing resource and the second attention head implemented by the second dedicated computing resource, for obtaining an output of the data processing task.