Learning how to use HPC infrastructure (part II)

Europe/Brussels
Maxwell/Shannon (first floor) (Louvain-La-Neuve)

Maxwell/Shannon (first floor)

Louvain-La-Neuve

Place du Levant 3 1348 Louvain-la-Neuve Belgium
Description

We will continue to learn the fundamental tool needed to use a cluster. We will cover how to choose/activate the (version) of software that you need, how to edit a file on any HPC cluster, how to submit a job on the cluster (SLURM) and how to beat the walltime.

Contents:

  • Choosing and activating software with system modules on CECI clusters
  • Writing and editing text files with Vim
  • Preparing, submitting and managing jobs with Slurm
  • Using a Checkpoint/restart program to overcome time limits

Prerequisite:

  • Being able to use SSH with private keys 

Type: Lecture Hands-on
Target audience: Rookie
Must: This session is a must-have for anyone.

Registration
Registration
48 / 50
    • 09:00 10:30
      Choosing and activating software with system modules on CECI clusters 1h 30m

      Software installed on the clusters is organised and managed with environment modules that allow choosing a specific version of a software package compiled with a given compiler, linked to chosen libraries, etc. This session explains how modules are used on the clusters.

      Contents:

      • The installed software
      • The modules command
      • What is Easybuild
      • What are the different toolchains
      • how to install software by yourself

      Prerequisite:

      • Being able to use SSH with private keys 
      • Being familiar with a text editor 
      • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

      Type: Hands-on
      Target audience: Everyone
      Must: This session is a must for anyone.

      Speaker: Bernard Van Renterghem (UCL CISM)
    • 10:45 12:15
      Writing and editing text files with Vim 1h 30m

      Vim is a very powerful text editor installed on all Unix systems, including Mac OSX, and used by many programs as default text editor. Knowing the basics is crucial on HPC. Mastering it will dramatically speed up tasks like the edition of submition scripts, configuration files, code, ...

      Contents:

      • Why use VIM on user interfaces ?
      • VIM modes
      • Movement and action commands
      • Macros
      • Plugins

      Prerequisite:

      • Being able to use SSH with private keys 

      Type: Hands-on
      Target audience: Everyone
      Must: This session is a must for anyone.

      Speaker: Jérôme de Favereau (UCLouvain/IRMP/CP3)
    • 13:00 15:30
      Preparing, submitting and managing jobs with Slurm 2h 30m

      Slurm is the job manager installed on all CÉCI clusters. The session teaches attendees how to prepare a submission script, how to submit, monitor, and manage jobs on the clusters.

       

      Contents:

      • Role and duties of a job scheduler/resource manager 
      • Creating and submitting a job 
      • Setting job constraints and parameters 
      • Managing and monitoring jobs 
      • Working interactively 
      • Getting accounting information for the jobs 
      • How priorities are computed 
      • Creating parallel jobs with shared-memory software
      • Creating parallel jobs with message passing software
      • Creating parallel jobs with master/slave software 

      Prerequisite:

      • Being able to use SSH with private keys 
      • Being familiar with a text editor 
      • Passive knowledge of parallelisation techniques (OpenMP, MPI)
      • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

      Type: Hands-on
      Target audience: Everyone
      Must: This session is mandatory.

      Speaker: Damien François (UCLouvain/CISM)
    • 15:40 16:25
      Using a Checkpoint/restart program to overcome time limits 45m

      Checkpointing and Restarting, or the art of stopping some computations to continue them later, or on another computer, is a very convenient way to get past time limits set on the clusters, and to protect against hardware or software failure on the compute nodes. 

      Contents:

      • Use and challenges of checkpointing
      • The different approaches
      • Checkpointing in Slurm
      • Using DMTCP for checkpointing

      Prerequisite:

      • Being able to use SSH with private keys 
      • Being familiar with a text editor 
      • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
      • Passive knowledge of either C, Fortran, Octave, Python or R

      Type: Hands-on
      Target audience: Everyone
      Must: This session is a must-have for anyone feeling oppressed by time limits.

      Speaker: Olivier Mattelaer (UCLouvain/CISM)