High Performance Implementationof Winner Selection Algorithms
Abstract: The process to find candidates, that fit certain win-conditions, from a collectionof wagers is the purpose of a winner selection algorithm. These candidates are filtered out in different groups called winning groups depending on the win-conditions. Svenska Spel AB is the largest company in the regulated gaming market in Sweden. It is crucial for winner selection algorithms at Svenska Spel to run as efficiently as possible, since results needs to be produced in near real time for many different games. Additionally, new services and features can be developed by having efficient algorithms that are not feasible with current sequential implementations. In this paper a variety of parallel approaches using OpenMP, Pthreads and CUDA are investigated to create efficient implementations of winner selection algorithms for the games Lotto and Bomben at Svenska Spel. Various preprocessing schemes are used on the original dataset to affect calculation times and enable different types of algorithms. Some of the algorithms are also extended, meaning they run on several, if not all, permutations of possible outcomes, something that is not possible to execute in reasonable time with thecurrent implementations. If these extended runs are feasible then it enables the possibility for new functionality with a more detailed statistical overview thatwere too compute heavy or slow to determine before. OpenMP and Pthreads run on the CPU while the CUDA algorithm uses the GPU. By using a baseline implementation they are all individually compared to determine their speed up. The effect preprocessing overhead and data allocation for CUDA is also evaluated. It results indicate that by performing all required preprocessing for the different approaches do not yield any performance gain. The preprocessing time, and data transfer to the GPU, occupies too large of a chunk of the execution timethat it is impossible to gain anything from doing the computations in parallel. However, by utilizing the preprocessed data several times significant speed upcan be achieved. The extended algorithm for Lotto runs more than 200 times faster on the GPU compared to the baseline algorithm. The parallel implementations for Bomben ranges from seven to 20 times the speed up, whichis not as impressive but arguable usable in different cases.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)