US 12,253,921 B2
CPU-GPU lockstep system
Da Qi Ren, Saratoga, CA (US); and Liang Peng, Los Gatos, CA (US)
Assigned to Huawei Technologies Co., Ltd., Shenzhen (CN)
Filed by Huawei Technologies Co., Ltd., Shenzhen (CN)
Filed on Apr. 14, 2023, as Appl. No. 18/300,642.
Application 18/300,642 is a continuation of application No. PCT/US2020/055955, filed on Oct. 16, 2020.
Prior Publication US 2023/0251941 A1, Aug. 10, 2023
Int. Cl. G06F 11/20 (2006.01); G06F 11/18 (2006.01)
CPC G06F 11/184 (2013.01) [G06F 11/2033 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A processing system, comprising:
three or more central processing unit (CPU)-graphical processing unit (GPU) pairs, the CPU of each of the CPU-GPU pairs configured to run kernels for programs executing on a corresponding GPU of the CPU-GPU pair;
a backup CPU; and
a lockstep controller connected to the three or more CPU-GPU pairs and to the backup CPU, the lockstep controller configured to:
operate the three or more CPU-GPU pairs in parallel to execute programs in a lockstep manner, the CPU of each CPU-GPU pairs running kernels for the programs in parallel;
compare an output from each CPU of the three or more CPU-GPU pairs for each of one or more kernels running on the CPUs of the three or more CPU-GPU pairs;
based upon comparing the outputs, determine whether any of the CPU-GPU pairs are defective; and
in response to determining that a first of the CPU-GPU pairs is defective:
discontinue the operation of the first GPU-CPU pair in parallel to execute programs in a lockstep manner with others of the three or more CPU-GPU pairs; and
operate the others of the three or more CPU-GPU pairs and the backup CPU and the GPU of a second of the CPU-GPU pairs in parallel to execute programs in a lockstep manner, the CPU of the second CPU-GPU pair operating as a CPU-GPU pair with the GPU of the second CPU-GPU pair and the backup CPU operating as a CPU-GPU pair with the GPU of the second CPU-GPU pair, the backup CPU and the CPUs of the others of the CPU-GPU pairs running kernels for the programs in parallel.