We discuss design and performance issues for computational science applications on GPU-based HPC platforms from a user/developer perspective. We focus particularly on the strong-scale limit, where HPC users typically run in practice. This limit is closely tied to the local problem size, that is, the number of degrees-of-freedom (or grid points) per MPI rank, which reflects the amount of parallel work for domain-decomposition-based parallelism. As this number decreases, communication effects become important. On GPUs, kernel launch latency is another factor controlling achievable speed-ups. We discuss several features that distinguish algorithmic optimization strategies for GPU- based HPC platforms from their predecessor, fine-grained distributed-memory, platforms such as IBM’s BG series. As concrete examples, we consider the performance of high-order spectral element methods for incompressible flow on all of Mira ( > 1 M ranks), all of Summit (> 27,000 ranks), and all of Frontier (> 70,000 ranks). We discuss optimizations for each of these platforms, with a particular focus on the Poisson problem, which is the stiffest and therefore most time-consuming substep in Navier-Stokes time advancement. Examples are presented in the context of Nek5000/RS, which is a high-order open-source code for thermal-fluid transport problems.
Paul Fischer received his Ph.D. in mechanical engineering from MIT in 1989.After postdoctoral research at MIT and Caltech, he joined Brown University as an assistant professor in the Applied Mathematics department. In 1998, he was hired as a mathematician at Argonne and was promoted to senior scientist in 2008. He is a world leader in computational fluid dynamicis.
More information about Paul Fisher on the Aragone National Laboratory website: https://www.anl.gov/profile/paul-f-fischer