US 12,282,662 B2
Chassis servicing and migration in a scale-up NUMA system
Thomas Edward McGee, Eau Claire, WI (US); Brian J. Johnson, Rosemount, MN (US); Frank R. Dropps, Annandale, MN (US); Derek S. Schumacher, Naperville, IL (US); Stuart C. Haden, Lucas, TX (US); and Michael S. Woodacre, Winchester (GB)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Houston, TX (US)
Filed on Aug. 29, 2022, as Appl. No. 17/898,189.
Prior Publication US 2024/0069742 A1, Feb. 29, 2024
Int. Cl. G06F 3/06 (2006.01); G06F 12/0817 (2016.01)
CPC G06F 3/0617 (2013.01) [G06F 3/0647 (2013.01); G06F 3/0679 (2013.01); G06F 12/0828 (2013.01); G06F 2212/271 (2013.01); G06F 2212/621 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for replacing a failing node with a spare node in a non-uniform memory access (NUMA) system, the method comprising:
in response to determining that a node-migration condition is met, initializing a node controller of the spare node such that accesses to a memory local to the spare node are to be processed by the node controller;
quiescing the failing node and the spare node to allow state information of processors on the failing node to be migrated to processors on the spare node;
subsequent to unquiescing the failing node and the spare node, migrating data from the failing node to the spare node while maintaining cache coherence in the NUMA system and while the NUMA system remains in operation, thereby facilitating continuous execution of processes previously executed on the failing node;
maintaining, at the node controller of the spare node, a partial directory of the local memory;
wherein initializing the node controller comprises marking every cache line in the local memory as corrupted; and
wherein migrating the data comprises, in response to determining that a requested cache line in the local memory of the spare node is marked as corrupted, coherently fetching the cache line from the failing node and writing the fetched cache line to the local memory of the spare node.