Automatic GPU optimization through higher-order functions in functional languages

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: John Wikman; [2020]

Keywords: ;

Abstract: Over recent years, graphics processing units (GPUs) have become popular devices to use in procedures that exhibit data-parallelism. Due to high parallel capability, running procedures on a GPU can result in an execution time speedup ranging from a couple times faster to several orders of magnitude faster, compared to executing serially on a central processing unit (CPU). Interfaces such as CUDA and OpenCL flexibly exposes the parallel capabilities of the GPU to the programmer, while at the same time putting a lot of responsibility on the programmer to handle aspects such as thread synchronization and memory management. A different approach to GPU optimization is to enable it through higher-order functions with known data-parallelism, using the semantics of the higher-order function to determine the parallel execution. This approach has in practice been integrated into existing languages through libraries or been integrated directly into languages themselves. However, higher-order functions do not address when it is beneficial to execute on a GPU. Due to the GPU being a separate device, effects such as latency and memory transfer can cause a slowdown for small input values. In this thesis, a set of commonly used higher-order functions are GPU enabled as compiler intrinsics in a small functional language. These higher-order functions are also equipped with the option of automatically deciding at runtime if to execute on GPU or CPU. Results show that running higher-order functions on GPU yields a speedup for larger computations. However, the performance does not match existing solutions that provide additional higher-order functions for optimizing the parallelization. The selected approach for automatically deciding whether to run a higher-order function on GPU or on CPU results in the faster option a majority of cases. Though the most notable benefit of automatic decisions was for procedures that use multiple higher-order function invocations, which ran faster compared to when executing only on GPU or only on CPU.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)