A tool for monitoring resource usage in large scale supercomputing clusters

University essay from PELAB - Laboratoriet för programmeringsomgivningar; Tekniska högskolan

Author: Andreas Petersson; [2012]

Keywords: ;

Abstract: Large scale computer clusters have during the last years become dominant for making computations in applications where extremely high computation capacity is required. The clusters consist of a large set of normal servers, interconnected with a fast network. As each node runs its own instance of the operating system, and each node is working, in that sense autonomously, supervising the whole cluster is a challenge. To get an overview of the efficency and utilization of the system, one cannot only look at one computer. It is necessary to monitor all nodes to get a good view on how the cluster behaves. Monitoring performance counters in a large scale computation cluster implies many difficulties. How can samples of performance metrics be made available for an operator? How can samples of performance metrics be stored? How can a large set of samples of performance metrics be visualized in a meaningful way? In this thesis it will be discussed how such a monitoring system can be implemented, what problems one may encounter and possible solutions.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)