Data Management (Data storage and transfer, filesystems, open data, data management plan)
Tuesday, 5 December 2023 -
09:30
Monday, 4 December 2023
Tuesday, 5 December 2023
09:30
Introduction to data storage and access
-
Damien François
(
UCLouvain/CISM
)
Introduction to data storage and access
Damien François
(
UCLouvain/CISM
)
09:30 - 10:45
Room: Shanon (Building Maxwell, first floor)
<table border="0" cellpadding="10px"> <tbody> <tr> <td colspan="2"> <p>Storing data in an efficient way is very important for many scientific applications. Yet, most of the time, a myriad of small files is used, imposing a large burdun on the file system, spending a lot of time in file access, making transfers very inefficients, etc. Other solutions exist and are presented in this session.</p> </td> </tr> <tr> <td rowspan="2"> <p><strong>Contents:</strong></p> <ul> <li>Storing in files vs in database</li> <li>Using an in-memory database</li> <li>Using HDF5 CLI tools and libraries</li> </ul> </td> <td> <p><strong>Prerequisite:</strong></p> <ul> <li>Being able to use SSH with private keys </li> <li>Being familiar with a text editor </li> <li>Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)</li> <li>Passive knowledge of either C, Fortran, Octave, Python or R</li> <li>Working knowledge of C or Fortran</li> <li>Familiarity with OpenMP and MPI</li> </ul> </td> </tr> <tr> <td> <p><strong>Type:</strong> Hands-on<br /> <strong>Target audience</strong>: Everyone<br /> <strong>Must: </strong>This session is a must-have for anyone who thinks generating a million small files is an optimal way of storing data.</p> </td> </tr> </tbody> </table>
11:00
Efficient data storage on CECI clusters
-
Ariel Lozano
(
ULB
)
Efficient data storage on CECI clusters
Ariel Lozano
(
ULB
)
11:00 - 12:45
Room: Shanon (Building Maxwell, first floor)
<table border="0" cellpadding="10px"> <tbody> <tr> <td colspan="2"> <p>The CECI clusters are equipped with <a class="moz-txt-link-freetext" href="https://support.ceci-hpc.be/doc/_contents/ManagingFiles/Storage.html">different storage solutions</a> that you can use for managing your data.<br /> Each of them have different properties such as capacity, I/O performance, accessibility and data longevity as they are meant for different usages.<br /> In this presentation we will go through the different options we have on the clusters and explain how to organize your workflows to make an efficient and practical use of them.</p> </td> </tr> <tr> <td rowspan="2"> <p><strong>Contents:</strong></p> <ul> <li>Storage solutions on the CECI clusters<br /> </li> <li>CECI environment variables for data location</li> <li>Data operations inside Slurm batch scripts</li> </ul> </td> <td> <p><strong>Prerequisite:</strong></p> <ul> <li>Being able to login to cluster </li> <li>Being familiar with a text editor </li> <li>Being able to submit jobs with Slurm</li> <li>Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)</li> </ul> </td> </tr> <tr> <td> <p><strong>Type:</strong> Hands-on<br /> <strong>Target audience</strong>: Everyone<br /> <strong>Must: </strong>This session is a must-have for anyone who doesn't know where $HOME, $WORKDIR, $GLOBALSCRATCH, $LOCALSCRATCH or $CECIHOME points to.</p> </td> </tr> </tbody> </table>
14:00
Open Science and Open Research Data / Data Management Plan
-
Jonathan Dedonder
(
IACCHOS
)
Open Science and Open Research Data / Data Management Plan
Jonathan Dedonder
(
IACCHOS
)
14:00 - 16:00
Room: Shanon (Building Maxwell, first floor)
The growing relevance of Open Science poses challenges to research practices. Open Research Data, which aims to provide free access to research data in order to ensure the reproducibility of scientific results, is one important aspect of Open Science. Research Data Management (RDM), on its side, addresses the entire life cycle of data, covering planning, collection, management, storage, publication, referencing, preservation and sharing of research data, as well as access and reuse rights. This seminar addresses concerns of openness, covers the integration of open Data/FAIR Data into research data management principles as well as practical aspects such as the publication of data in repositories.