US 12,366,903 B2
System and method of protecting workloads for GPU complex servers with liquid assisted air cooling
Chandrasekhar Mugunda, Austin, TX (US); Rui An, Austin, TX (US); Hsien-Tsung Lin, Taoyuan (TW); and Syamu Sajja, Leander, TX (US)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Mar. 1, 2023, as Appl. No. 18/176,592.
Prior Publication US 2024/0295909 A1, Sep. 5, 2024
Int. Cl. G06F 1/20 (2006.01); G05B 19/416 (2006.01)
CPC G06F 1/20 (2013.01) [G05B 19/416 (2013.01); G05B 2219/49216 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A data processing system for providing computer-implemented services using, at least in part, a graphics processing unit complex comprising graphics processors and pumps used to thermally manage the graphics processors, comprising:
a processor; and
a management controller that is programmed to:
make a first determination regarding whether actual operating conditions for the graphics processing unit complex are within a range of known good configuration data, the actual operation conditions being based on operation of the pumps;
in an instance of the first determination where the actual operating conditions are not within the range:
obtain a failure pattern based on the actual operating conditions;
make a second determination regarding whether the failure pattern matches a known failure pattern;
in an instance of the second determination where the failure pattern matches the known failure pattern:
obtain a type of failure for the known failure pattern;
make a third determination regarding whether the type of failure matches a known failure type;
in an instance of the third determination where the failure type matches the known failure type:
 obtain a failure response corresponding to the known failure type; and
 performing a remediation for the graphics processing unit complex using the failure response.