US 12,093,712 B2
Method and apparatus for handling memory failure, electronic device and storage medium
Xiaowei Hu, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., Beijing (CN)
Filed on Mar. 24, 2021, as Appl. No. 17/211,272.
Claims priority of application No. 202010477094.7 (CN), filed on May 29, 2020.
Prior Publication US 2021/0208923 A1, Jul. 8, 2021
Int. Cl. G06F 9/455 (2018.01); G06N 7/01 (2023.01)
CPC G06F 9/45558 (2013.01) [G06N 7/01 (2023.01); G06F 2009/4557 (2013.01); G06F 2009/45583 (2013.01)] 11 Claims
OG exemplary drawing
 
1. A method for handling a memory failure, comprising:
in response to detecting a failure occurring in memory of a host machine, acquiring a failure parameter of the memory;
determining a crash probability of the host machine based on the failure parameter; and
transferring all virtual machines on the host machine to a target host machine when the crash probability is greater than or equal to a first predetermined threshold, wherein a crash probability of the target host machine is less than a second predetermined threshold, the second predetermined threshold is less than the first predetermined threshold;
wherein the method further comprises:
acquiring a first control instruction sent by a kernel system;
writing information into a target position of a target memory page of the host machine based on the first control instruction;
generating a first code corresponding to the target position of the target memory page based on the written information;
acquiring a second control instruction sent by a kernel system;
reading information out from the target position of the target memory page of the host machine based on the second control instruction;
generating a second code corresponding to the target position of the target memory page based on the read-out information; and
determining that the failure occurs in the target memory page when the first code is different from the second code;
wherein the acquiring the failure parameter of the memory comprises:
parsing the first code and the second code based on a predetermined algorithm;
acquiring difference codes between the first code and the second code after the parsing;
determining one or more incorrect bits corresponding to the target position of the target memory page based on the difference codes; and
determining a number of the one or more incorrect bits and position features of the one or more incorrect bits based on the one or more incorrect bits;
the method further comprising:
marking the memory when the crash probability of the host machine is less than the first predetermined threshold and greater than or equal to the second predetermined threshold; and
determining target virtual machines based on the crash probability of the host machine and a number of all the virtual machines on the host machine and transferring the target virtual machines, wherein a number of the target virtual machines is less than the number of all the virtual machines.