A Comparison Between Deep Q-learning and Deep Deterministic Policy Gradient for an Autonomous Drone in a Simulated Environment

University essay from Mälardalens högskola/Akademin för innovation, design och teknik

Author: Dennis Tagesson; [2021]

Keywords: ;

Abstract: This thesis investigates how the performance between Deep Q-Network (DQN) with a continuous and discrete state- and action space, respectively, and Deep Deterministic Policy Gradient (DDPG) with a continuous state- and action space compare when trained in an environment with a continuous state- and action space. The environment was a simulation where the task for the algorithms was to control a drone from the start position to the location of the goal. The purpose of this investigation is to gain insight into how important it is to consider the action space of the environment when choosing a reinforcement learning algorithm. The action space of the environment is discretized for Deep Q-Network by restricting the number of possible actions to six. A simulation experiment was conducted where the algorithms were trained in the environment. The experiments were divided into six tests, where each test had the algorithms trained for 5000, 10000, or 35000 number of steps and with two different goal locations. The experiment was followed by an exploratory analysis of the data that was collected. Four different metrics were used to determine the performance. My analysis showed that DQN needed less experience to learn a successful policy than DDPG. Also, DQN outperformed DDPG in all tests but one. These results show that when choosing a reinforcement learning algorithm for a task, an algorithm with the same type of state- and action space as the environment is not necessarily the most effective one.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)