Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Simon Ignat; [2018]

Keywords: ;

Abstract: Network Function Virtualization (NFV) is the transition from proprietary hardware functions to virtualized counterparts of them within the telecommunication industry. These virtualized counterparts are known as Virtualized Network Functions (VNFs) and are the main building blocks of NFV. The transition started 2012 and is still ongoing, with research and development moving at a high pace. It is believed that when using virtualization, both capital and operating expenses can be lowered as a result of easier deployments, cheaper systems and networks that can operate more autonomous. This thesis examines if the current state of NFV can lower the operating expenses while maintaining quality of service (QoS) high by using current state of the art machine learning algorithms. More specifically the thesis analyzes the problem of adaptive autoscaling of virtual machines (VMs) allocated by the VNFs with deep reinforcement learning (DRL). To analyze the task, the thesis implements a discrete time model for VNFs with the purpose of capturing the fundamental characteristics of the scaling operation. It also examines the learning and robustness/generalization of six state-of-the-art DRL algorithms. The algorithms are examined since they have fundamental differences in their properties, ranging from off-policy methods such as DQN to on-policy methods such as PPO Advantage Actor Critic. The policies are compared to a baseline P-controller to evaluate the performance with respect to simpler methods. The result from the model show that DRL needs around 100,000 samples to converge, which in a real setting would represent around 70 days of learning. The thesis also shows that the final policy applied by the agent does not show considerable improvements over a simple control algorithm with respect to reward and performance when multiple experiments with varying loads and configurations are tested. Due to the lack of data and slow real time systems, with robustness being an important consideration, the time to convergence requiredby a DRL agent is to long for an autoscaling solution to be deployed in the near future. Therefore, the author can not recommend DRL for autoscaling in VNFs given the current state of the technology. Instead the author recommend simpler methods, such as supervised machinelearning or classical control theory.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)