Checkpointing and Restarting, or the art of stopping some computations to continue them later, or on another computer, is a very convenient way to get past time limits set on the clusters, and to protect against hardware or software failure on the compute nodes. |
|
Contents:
|
Prerequisite:
|
Type: Hands-on |
UCLouvain/CISM