Decreasing Training Time of Reinforcement Learning Agents for Remote Tilt Optimization using a Surrogate Neural Network Approximator

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Jiaming Huang; [2023]

Keywords: ;

Abstract: One possible application of reinforcement learning in the telecommunication field is antenna tilt optimization. However, one of key challenges we face is that the use of handcrafted simulators as environments to provide information for agents is often time-consuming regarding training reinforcement learning agents. To address this challenge, we propose a neural-based model that can be used interchangeably as a new simulator. The proposed AI-enabled simulator uses a graph neural network (GNN) as its backbone network, which functions as both the transition function and the immediate reward function in a Markov decision process. The simulator is able to learn geographical information for each antenna through the use of graph convolutional networks. Additionally, a skip connection is added to the output of the network to help generate the final state and reward. This allows the proposed model to effectively capture the spatial information of the communication network and provide accurate trajectories for the reinforcement learning agents. During the test stage, each agent can inject an action into the environment, and the new environment will return the reward for the action and a new observation of the environment. This allows the agents to learn from the environment and improve their performance over time. A comparison of the proposed model with a baseline deep neural network (DNN) model further verifies the superiority of the GNN in capturing the spatial information of the communication network. The final results of experiments demonstrate that the proposed GNN-based simulator is able to achieve competitive accuracy while significantly reducing the training time compared to the baseline models such as DNN and hand-crafted simulator. In conclusion, the proposed neural-based model is a promising solution for the efficient training of RL agents in the field of antenna tilt optimization. It can significantly reduce training time while maintaining competitive accuracy and effectively capturing the spatial information of the communication network.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)