Building Detection in Deformed Satellite Images Using Mask R-CNN
Abstract: Background: In the recent research of automatic building detection, aerial and satellite images are used. Automatic building detection from satellite images is useful for urban planning, after natural disasters for identifying the voids. It is time consuming and inefficient to detect buildings from satellite images with human effort so a deep learning based Mask R-CNN (Mask Regional-Convolutional Neural Network) is used to detect and segment the buildings from the satellite images. To evaluate the performance of the model, different augmentations are implemented on the test dataset. TTA (Test Time Augmentation) wrapper is used to evaluate the performance of the trained model on the test dataset to state how accurately the building is detected for each augmentation. Objectives: The main goal of this research is to formulate a model which should be able to detect and segment every building by using the data (provided by Sony) and that should also be scalable to identify different buildings across different countries from the satellite images. Also the model should be able to deliver the result with an Average Precision in the range of 0.5 to 1 even after every augmentation is applied to the test dataset. Methods: To obtain results with Average Precision within the desired range, a systematic literature review has been conducted to choose the suitable algorithm to detect and segment the building. After the systematic literature review, Mask R-CNN was found to be an effective and impressive algorithm for detection and as well as segmentation. Generally, to increase the size of dataset and the performance of the model, augmentation is applied while training. An experiment is conducted to formulate a model for building detection which was established and trained without augmentation using Mask R-CNN as the dataset provided is already large in size. The aim of the research is to evaluate the performance of the model trained but not to improve. So, TTA is an application of augmentation which is implemented on the test dataset for evaluation the performance of the trained model. Results: After the literature review, the Mask R-CNN algorithm is used to formulate a model for building detection and segmentation. The image predicted by the model without augmentation is applied with TTA on test dataset to calculate an Average Precision and a mAP (mean Average Precision) for all the augmented images. The Average Precision values for different augmentations are found out to be in the range of 0.5 to 1 except for the Noise augmentation which is below the desired range. Conclusions: Mask R-CNN model preformed well for the prediction. Average Precision value for each augmented value is calculated with TTA. Best augmentations to detect and segment the buildings are horizontally flipped, vertically flipped, bright and contrast. These augmentations having good performance. Noise augmentation has low performance. For the best combination of augmentations, noise augmentation can be excluded.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)