US 12,112,207 B2
Selecting nodes in a cluster of nodes for running computational jobs
David Strenski, Ypsilanti, MI (US)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Houston, TX (US)
Filed on Apr. 9, 2021, as Appl. No. 17/226,296.
Prior Publication US 2022/0326993 A1, Oct. 13, 2022
Int. Cl. G06F 9/50 (2006.01)
CPC G06F 9/505 (2013.01) [G06F 9/5044 (2013.01); G06F 9/5072 (2013.01); G06F 9/5083 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method comprising: gathering, by a processing element of a scheduler node, information about a cluster of nodes in a high-performance computing system, wherein the high-performance computing system is in a production state with one or more computational workloads getting executed thereon; periodically sending, by the processing element, one or more test-computing jobs for execution on each node, of the cluster of nodes, to measure one or more performance metrics thereof; receiving, by the processing element, measured performance metrics from each node in response to the one or more test-computing jobs executed thereon; recording, by the processing element, in a database, the measured performance metrics received from each node, wherein recording the measured performance metrics comprises: determining, by the processing element, whether to update the database by comparing the measured performance metrics of a current instance with the performance metrics recorded in the database at a previous instance, for each node of the cluster of nodes: and in response to determining, based on the comparison, a change in the performance metrics, updating, by the processing element, the database with the measured performance metrics of the current instance; receiving a request to run one or more computational jobs on the high-performance computing system; selecting, by the processing element based on the received request and the measured performance metrics recorded in the database, a set of nodes from the cluster of nodes for running the requested one or more computational jobs on the high-performance computing system; and sorting, by the processing element, the cluster of nodes in the fastest to slowest order of an actual processing speed based on a performance metric selected from the measured performance metrics.