CPC G06F 11/1407 (2013.01) [G06F 9/30101 (2013.01); G06F 9/3861 (2013.01); G06F 11/0772 (2013.01); G06F 11/1438 (2013.01)] | 20 Claims |
1. A computer-implemented method for checkpointing a context associated with an execution of a software application on a parallel processor, the method comprising:
determining that a kernel executing on a plurality of parallel processing elements included in the parallel processor is tagged to indicate that the kernel is enabled for intra-kernel checkpointing and restart;
causing the plurality of parallel processing elements to stop executing a first plurality of instructions included in the kernel in accordance with the context before executing a next instruction included in the first plurality of instructions;
causing the parallel processor to collect first state data associated with the context;
generating a checkpoint based on the first state data, wherein the checkpoint is stored in a memory associated with the parallel processor; and
causing the plurality of parallel processing elements to resume executing the first plurality of instructions included in the kernel at the next instruction in accordance with the context.
|