Data Management (Data storage and transfer, filesystems, open data, data management plan)

Europe/Brussels
Shanon (Building Maxwell, first floor) (Louvain-La-Neuve)

Shanon (Building Maxwell, first floor)

Louvain-La-Neuve

Place du Levant 3 1348 Louvain-la-Neuve Belgium
Description
All HPC program will produce or consume data at a point or another.
Where to write/store those data during and after your program is finished is important both in term of efficiency and in term of data preservation (which is important due to the growing relevance of Open Science)

Contents:

  • Introduction to data storage and access
  • Efficient data storage on CECI clusters
  • Open Science and Open Research Data / Data Management Plan

Prerequisite: None

Type: presentation, discussions and hands on


Must: This session is a must have for researchers concerned by the dissemination of research results and by their impact.

Registration
Registration
16 / 50
    • 09:30 10:45
      Introduction to data storage and access 1h 15m

      Storing data in an efficient way is very important for many scientific applications. Yet, most of the time, a myriad of small files is used, imposing a large burdun on the file system, spending a lot of time in file access, making transfers very inefficients, etc. Other solutions exist and are presented in this session.

      Contents:

      • Storing in files vs in database
      • Using an in-memory database
      • Using HDF5 CLI tools and libraries

      Prerequisite:

      • Being able to use SSH with private keys 
      • Being familiar with a text editor 
      • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
      • Passive knowledge of either C, Fortran, Octave, Python or R
      • Working knowledge of C or Fortran
      • Familiarity with OpenMP and MPI

      Type: Hands-on
      Target audience: Everyone
      Must: This session is a must-have for anyone who thinks generating a million small files is an optimal way of storing data.

      Speaker: Damien François (UCLouvain/CISM)
    • 11:00 12:45
      Efficient data storage on CECI clusters 1h 45m

      The CECI clusters are equipped with different storage solutions that you can use for managing your data.
      Each of them have different properties such as capacity, I/O performance, accessibility and data longevity as they are meant for different usages.
      In this presentation we will go through the different options we have on the clusters and explain how to organize your workflows to make an efficient and practical use of them.

      Contents:

      • Storage solutions on the CECI clusters
         
      • CECI environment variables for data location
      • Data operations inside Slurm batch scripts

      Prerequisite:

      • Being able to login to cluster 
      • Being familiar with a text editor 
      • Being able to submit jobs with Slurm
      • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

      Type: Hands-on
      Target audience: Everyone
      Must: This session is a must-have for anyone who doesn't know where $HOME, $WORKDIR, $GLOBALSCRATCH, $LOCALSCRATCH or $CECIHOME points to.

      Speaker: Ariel Lozano (ULB)
    • 14:00 16:00
      Open Science and Open Research Data / Data Management Plan 2h

      The growing relevance of Open Science poses challenges to research practices. Open Research Data, which aims to provide free access to research data in order to ensure the reproducibility of scientific results, is one important aspect of Open Science. Research Data Management (RDM), on its side, addresses the entire life cycle of data, covering planning, collection, management, storage, publication, referencing, preservation and sharing of research data, as well as access and reuse rights.

      This seminar addresses concerns of openness, covers the integration of open Data/FAIR Data into research data management principles as well as practical aspects such as the publication of data in repositories.

      Speaker: Jonathan Dedonder (IACCHOS)