US 12,346,403 B2
Distributing matrix multiplication processing among processing nodes
Aaron M. Collier, Bloomington, MN (US)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed on Jun. 5, 2024, as Appl. No. 18/734,123.
Application 18/734,123 is a division of application No. 18/189,625, filed on Mar. 24, 2023, granted, now 12,061,666.
Application 18/189,625 is a division of application No. 16/886,189, filed on May 28, 2020, granted, now 11,640,443, issued on May 2, 2023.
Prior Publication US 2024/0320300 A1, Sep. 26, 2024
Int. Cl. G06F 17/16 (2006.01); G06F 9/50 (2006.01); G06F 9/54 (2006.01)
CPC G06F 17/16 (2013.01) [G06F 9/5066 (2013.01); G06F 9/544 (2013.01)] 15 Claims
OG exemplary drawing
 
11. An apparatus comprising:
at least one hardware processor; and
a memory to store instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to:
partition matrices associated with a matrix multiplication into a plurality of partitions that corresponds to a plurality of processor sockets;
size the plurality of partitions based on first costs corresponding to candidate matrix decompositions to provide a plurality of sized partitions, wherein a given first cost of the first costs is based on a first comparative analysis of first block sizes of the corresponding candidate matrix decomposition along a first dimension and second block sizes of the corresponding candidate matrix decomposition along a second dimension;
assign the plurality of sized partitions to the plurality of processor sockets in a manner that a given set of sized partitions of the plurality of sized partitions is assigned to a given processor socket of the plurality of processor sockets;
for the given set of sized partitions, subdivide the partitions of the given set based on second costs corresponding to candidate matrix sub-decompositions to provide a plurality of sized sub-partitions, wherein a given second cost of the second costs is based on a second comparative analysis of first sub-block sizes of the corresponding candidate matrix sub-decomposition along the first dimension and second sub-block sizes of the corresponding candidate matrix sub-decomposition along the second dimension; and
assign the plurality of sized sub-partitions to a plurality of processing nodes of the given processor socket for performing processing of the multiplication.