Predicting Thread and Page Mappings for NUMA systems
Abstract: odern multi-core systems have Non-Uniform Memory Accesses effects here the access time of a thread to its data (page) depends onUMA node memory connection. Applications can be executed across a arge range of diverse mapping combinations, such as Contiguousnd Scatter thread policies and First-touch, Locality and Balance age policies. A number of used NUMA nodes and the degree of arallelism also must be considered. ecause of the complex interaction between threads and data and ue to system complexity, it is challenging to predict the erformance impact of a mapping. n this thesis, we propose a clustering model that we trained on iverse applications to predict the best mapping. The model takes dvantage of how different applications are sensitive in the sameay to different NUMA changes. To implement the model, first we lustered codes based on their relative execution improvement cross different mappings. We assume that codes in the same luster will benefit from the same mapping: to predict a new code apping, our model assigns it to an existing cluster and selectsts mapping. e further improved our model by directly clustering the codes ver a subset of hardware performance counters.erformance counters are able to collect more relevant information nd allowed us to cluster the codes with only two features execution time and local bandwidth) while reasonably preserving he model performance gains. In other words, instead of runningll the mappings, with only two runs (one run per feature), weere able to select mappings that achieve almost 2x speedup. As a esult, our approach provides competitive performance in omparison to most recent related work as we consider a large pace of NUMA mappings.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)