Towards an Efficient Spectral Element Solver for Poisson’s Equation on Heterogeneous Platforms
Abstract: Neko is a project at KTH to refactor the widely used fluid dynamics solver Nek5000 to support modern hardware. Many aspects of the solver need adapting for use on GPUs, and one such part is the main communication kernel, the Gather-Scatter (GS) routine. To avoid race conditions in the kernel, atomic operations are used, which can be inefficient. To avoid the use of atomics, elements were grouped in such a way that when multiple writes to the same address are necessary, they will always come in blocks. This way, each block can be assigned to a single thread and handled sequentially, avoiding the need for atomic operations altogether. In the scope of the thesis, a Poisson solver was also ported from CPU to Nvidia GPUs. To optimise the Poisson solver, a batched matrix multiplication kernel was developed to efficiently perform small matrix multiplications in bulk, to better utilise the GPU. Optimisations using shared memory and kernel unification was done. The performance of the different implementations was tested on two systems using a GTX1660 and dual Nvidia A100 respectively. The results show only small differences in performance between the two versions of the GS kernels when only considering computational cost, and in a multi-rank setup the communication time completely overwhelms any potential difference. The shared memory matrix multiplication kernel yielded around a 20% performance boost for the Poisson solver. Both versions vastly outperformed cuBLAS. The unified kernel also had a large positive impact on the performance, yielding up to a 50% increase in throughput.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)