Aerial View Image-Goal Localization with Reinforcement Learning

University essay from Lunds universitet/Matematik LTH

Abstract: With an increased amount and availability of unmanned aerial vehicles (UAVs) and other remote sensing devices (e.g. satellites) we have recently seen an explosion in computer vision methodologies tailored towards processing and understanding aerial view data. One application for such technologies is in the area of search-and-rescue (SAR), where the task is to localize and assist one or several people who are missing, for example after a natural disaster. In many cases the rough location may be known and a UAV can be deployed to explore a given, confined area to precisely localize the missing people. In such a time- and resource-constrained setting, controlling the UAV in an informed and intelligent manner – as opposed to exhaustively scanning the whole area along a pre-defined trajectory – could significantly improve the likelihood of succeeding with the mission. In this master thesis we approach this type of problem by abstracting it as an aerial view image-goal localization task within a framework that emulates a SAR-like setup without requiring access to actual UAVs. In this framework an agent operates on top of a given satellite image and is tasked with localizing a specific goal, specified as a rectangular region within the satellite image, from a given location in the image. The agent is never allowed to observe the underlying satellite image in its entirety, not even at low resolution, and thus it has to operate solely based on sequentially observed partial glimpses when navigating towards the goal location. To tackle our suggested aerial view image-goal localization task, we propose AiRLoc, a fully trainable reinforcement learning (RL)-based model. AiRLoc can be trained with no annotations of any kind and is hence able to learn the localization task in an entirely self-supervised manner. Our experimental results suggest that AiRLoc significantly outperforms heuristic search methods as well as non-RL-based machine learning methods. The results also indicate that providing AiRLoc with mid-level vision capabilities (specifically, a pre-trained semantic segmentation network) can lead to even better performance. We also conduct a proof-of-concept study which suggests that AiRLoc – with or without semantic segmentation as input – outperforms humans on average.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)