Predicting Service Metrics from Device and Network Statistics

University essay from KTH/Kommunikationsnät

Abstract: For an IT company that provides a service over the Internet like Facebook or Spotify, it is very important to provide a high quality of service; however, predicting the quality of service is generally a hard task. The goal of this thesis is to investigate whether an approach that makes use of statistical learning to predict the quality of service can obtain accurate predictions for a Voldemort key-value store [1] in presence of dynamic load patterns and network statistics. The approach follows the idea that the service-level metrics associated with the quality of service can be estimated from serverside statistical observations, like device and network statistics. The advantage of the approach analysed in this thesis is that it can virtually work with any kind of service, since it is based only on device and network statistics, which are unaware of the type of service provided. The approach is structured as follows. During the service operations, a large amount of device statistics from the Linux kernel of the operating system (e.g. cpu usage level, disk activity, interrupts rate) and some basic end-to-end network statistics (e.g. average round-trip-time, packet loss rate) are periodically collected on the service platform. At the same time, some service-level metrics (e.g. average reading time, average writing time, etc.) are collected on the client machine as indicators of the store’s quality of service. To emulate network statistics, such as dynamic delay and packet loss, all the traffic is redirected to flow through a network emulator. Then, different types of statistical learning methods, based on linear and tree-based regression algorithms, are applied to the data collections to obtain a learning model able to accurately predict the service-level metrics from the device and network statistics. The results, obtained for different traffic scenarios and configurations, show that the thesis’ approach can find learning models that can accurately predict the service-level metrics for a single-node store with error rates lower than 20% (NMAE), even in presence of network impairments.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)