US 12,326,811 B2
	Fault tolerant systems and methods using shared memory configurations
Andrew Alden, Leominster, MA (US); Chester Pawlowski, Burlington, MA (US); Christopher Cotton, Nashua, NH (US); and John Chaves, Hudson, MA (US)
Assigned to STRATUS TECHNOLOGIES IRELAND LTD., (IE)
Filed by STRATUS TECHNOLOGIES IRELAND LTD., Dublin (IE)
Filed on Nov. 30, 2022, as Appl. No. 18/072,297.
Prior Publication US 2024/0176739 A1, May 30, 2024
Int. Cl. G06F 12/0815 (2016.01); G06F 11/20 (2006.01); G06F 12/0891 (2016.01)

CPC G06F 12/0815 (2013.01) [G06F 11/2025 (2013.01); G06F 11/2028 (2013.01); G06F 12/0891 (2013.01); G06F 2212/1032 (2013.01)]

21 Claims

1. A fault tolerant computer system comprising:

one or more shared memory complexes, each memory complex comprising a group of M computer-readable memory storage devices;

one or more cache coherent switches comprising two or more host ports and one or more downstream device ports, the cache coherent switch in electrical communication with the one or more shared memory complexes;

a first management processor in electrical communication with the cache coherent switch, the management processor comprising firmware configured to coordinate and assist with one or more failover functions;

an interconnect comprising one or more front-end interconnects and one or more back-end interconnects, wherein the one or more cache coherent switches are in electrical communication with the one or more back-end interconnects,

a first compute node comprising a first processor and a first cache, the first compute node in electrical communication with the one or more cache coherent switches and the one or more shared memory complexes, the first compute node configured to run an operating system and a customer application; and

a second compute node comprising a second processor and a second cache, the second compute node in electrical communication with the one or more cache coherent switches and the one or more shared memory complexes,

wherein the first compute node and the second compute node are in electrical communication with the one or more front-end interconnects,

wherein data stored in the one or more shared memory complexes by the first compute node and modified thereby because of operations executing on the first compute node is available for the second compute node to use and modify on a substantially real time basis in the event the second compute node takes over for the first node upon the first compute node undergoing a performance degradation event, wherein the management processor is configured to signal that the second compute node is able to serve as the standby node and take over for the first compute node using the firmware, wherein the second compute node runs the operating system and the customer application after a failover time.