Particle Simulation using Asynchronous Compute : A Study of The Hardware

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background. With the introduction of the compute shader, followed by the application programming interface (API) DirectX 12, the modern GPU is now going through a transformation. Previously the GPU was used as a massive computational tool for running a single task at unparalleled speed. The compute shader made it possible to run CPU like programs on the GPU, DirectX 12 takes this even further by introducing a multi-engine architecture. Multi-engine architecture unlocks the possibility of running the compute shader alongside the regular graphical stages, this concept is called asynchronous compute. Objectives. This thesis aims to investigate if asynchronous compute can be used to increase the performance of particle simulations. The key metrics being studied are total frame time, rendered frames per second, and overlap time. The frst two are used to determine if asynchronous compute improves performance or not, while the last is used to determine if the particle simulation is running asynchronous compute or not.Methods. For this thesis, the particle simulation used is the N-body particle simulation.The N-body particle simulation is implemented using a compute shader and is part of a larger DirectX 12 framework. One application is implemented that run two different execution models, one is the standard sequential execution model and one is the asynchronous compute model. The main difference between the two execution models is that the sequential execution model will be using only one command queue, this being a 3D command queue. The asynchronous compute model will be running a separate compute command queue alongside the 3D command queue. The performance metrics being studied are all collected using a custom-built GPU profiler. Results. The results indicate that it is possible to increase the performance of particle simulations using asynchronous compute. The registered performance gain reaches as high as 34% on hardware that supports asynchronous compute while hardware that according to NVIDIA does not support asynchronous compute registered performance gains up towards 11%. In terms of overlap time between the compute workload and the graphical workload, the AMD GPU showed an overlap time that matched the frame time. However, NVIDIA GPUs did not show the expected overlap time. Conclusions. It can be determined that asynchronous compute provide benefits when compared to the sequential execution model, it can be used to increase the performance of particle simulations. However, since the research in this thesis only made use of a single particle simulation, more work needs to be done, for example, work to test if the performance gain can be improved even further using different methods like, workload pairing or utilizing multiple GPUs, however that kind of work requires the use of a larger-scale application that consists of multiple different tasks other than just a single particle simulation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)