| CPC G06F 9/505 (2013.01) [G06F 9/5044 (2013.01); G06F 11/3428 (2013.01); G06F 2201/80 (2013.01); G06F 2209/5019 (2013.01); G06F 2209/505 (2013.01)] | 20 Claims |

|
1. A system comprising:
at least one processor programmed to:
collect data associated with a plurality of workload performance profiling counters associated with a workload during one or more executions of the workload in a High Performance Computing (HPC) system comprising multiple nodes each having a plurality of tunable hardware execution parameters;
store, in an architecture-specific knowledge database, a class fingerprint for a workload class of which the workload is a member, the class fingerprint being based on a workload-specific fingerprint of the workload according to a machine-learning (ML) technique, the workload-specific fingerprint being based on the collected data;
store in the architecture-specific knowledge database, in association with the class fingerprint, optimal settings for the plurality of tunable hardware execution parameters measured against a specified optimization metric based on at least one variation of at least a portion of the plurality of tunable hardware execution parameters in the HPC system during the one or more executions of the workload; and
during an initial execution of a given workload on a given HPC system, vary the plurality of tunable hardware execution parameters of each node of the given HPC system that is executing the workload to the optimal settings stored in the architecture-specific knowledge database in association with the class fingerprint for continued execution of the given workload on the given HPC system resulting in improved execution of the given workload relative to the initial execution, wherein the given HPC system is the HPC system or another HPC system, wherein the given workload is the workload or another workload, and wherein the given workload has a workload-specific fingerprint corresponding to the class fingerprint.
|