| CPC G06F 17/16 (2013.01) [G06F 9/5066 (2013.01); G06F 9/544 (2013.01)] | 15 Claims |

|
11. An apparatus comprising:
at least one hardware processor; and
a memory to store instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to:
partition matrices associated with a matrix multiplication into a plurality of partitions that corresponds to a plurality of processor sockets;
size the plurality of partitions based on first costs corresponding to candidate matrix decompositions to provide a plurality of sized partitions, wherein a given first cost of the first costs is based on a first comparative analysis of first block sizes of the corresponding candidate matrix decomposition along a first dimension and second block sizes of the corresponding candidate matrix decomposition along a second dimension;
assign the plurality of sized partitions to the plurality of processor sockets in a manner that a given set of sized partitions of the plurality of sized partitions is assigned to a given processor socket of the plurality of processor sockets;
for the given set of sized partitions, subdivide the partitions of the given set based on second costs corresponding to candidate matrix sub-decompositions to provide a plurality of sized sub-partitions, wherein a given second cost of the second costs is based on a second comparative analysis of first sub-block sizes of the corresponding candidate matrix sub-decomposition along the first dimension and second sub-block sizes of the corresponding candidate matrix sub-decomposition along the second dimension; and
assign the plurality of sized sub-partitions to a plurality of processing nodes of the given processor socket for performing processing of the multiplication.
|