US 12,189,468 B2
Cloud scale server reliability management
Theodros Yigzaw, Sherwood, OR (US); John Holm, Beaverton, OR (US); Subhankar Panda, Portland, OR (US); Hugo Enrique Gonzalez Chavero, Tlaquepaque (MX); Satyaprakash Nanda, Portland, OR (US); Omar Avelar Suarez, Zapopan (MX); and Guarav Porwal, Portland, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on May 27, 2021, as Appl. No. 17/332,302.
Prior Publication US 2021/0286667 A1, Sep. 16, 2021
Int. Cl. G06F 11/07 (2006.01)
CPC G06F 11/0793 (2013.01) [G06F 11/0751 (2013.01); G06F 11/0772 (2013.01)] 14 Claims
OG exemplary drawing
 
1. An electronic apparatus, comprising:
one or more substrates; and
a controller coupled to the one or more substrates, the controller including circuitry which is configured to execute firmware instructions to coordinate a management of a memory subsystem with an operating system (OS) which is to be executed with a host processor of a platform which is to comprise the controller and the memory subsystem, wherein the circuitry to coordinate the management comprises the circuitry to:
proactively provide to the OS a notification of a failure event at the memory subsystem;
receive a communication from the OS which indicates that, based on the notification, the OS is to temporarily map out a page of the memory subsystem which is related to the failure event;
in response to the communication:
release the page of the memory subsystem; and
initiate a self-repair action for the page; and
report to the OS a status of the self-repair action;
wherein the OS is to determine, based on the status, whether the page is to be reclaimed.