Network Drone Control using Deep Reinforcement Learning

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Alex Hermansson Grobgeld; [2020]

Keywords: ;

Abstract: In this work, a reinforcement learning approach is adopted to control a drone in a cellular network. The goal is to find paths between arbitrary locations such that low radio quality areas, defined with respect to signal-to-interference-plus-noise-ratio, are avoided with the cost of longer flight paths. The controlling agent learns to take decisions without having access to any propagation model, learning only through feedback in the form of a reward signal evaluating its behavior. In the reward function, designed for this particular problem, there are simple parameters for modifying the focus of the agent to put more emphasis on short paths, or paths with high radio quality. The proposed agent uses a learning algorithm that combines Double Deep Q-Networks with Hindsight Experience Replay to handle the stochastic environment with multiple goals. A neural network is used to approximate the optimal Q-values and the experiences are collected using a Boltzmann exploration policy. Three different scenarios are studied: flight trajectories on constant altitude, with and without measurement noise in the radio quality, and flight trajectories on varying altitudes with measurement noise. In all scenarios, simulation results show that the agent successfully avoids low radio quality areas by taking longer flight paths, as desired. The probability of flying through areas with low radio quality is reduced by between 62 and 75 percent compared to the baseline agent, that flies greedily toward the goal. In 90 percent of the evaluation instances, for all three scenarios, the flight paths are shorter than two times the shortest possible path and the median length is around 1.3 times the shortest path. Thus, the reinforcement learning agent holds a clear advantage over the baseline for applications were the radio quality is of high importance. In the case of constant flight altitude, it is also possible to visualize and gain insight into the decision making process through the learned value function. It is evident that this function reflects the radio quality as one ca see patterns resembling those of the underlying signal distribution.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)