Benchmarking structure from motion algorithms with video footage taken from a drone against laser-scanner generated 3D models

University essay from Luleå tekniska universitet/Rymdteknik

Abstract: Structure from motion is a novel approach to generate 3D models of objects and structures. The dataset simply consists of a series of images of an object taken from different positions. The ease of the data acquisition and the wide array of available algorithms makes the technique easily accessible. The structure from motion method identifies features in all the images from the dataset, like edges with gradients in multiple directions, and tries to match these features between all the images and then computing the relative motion that the camera was subject to between any pair of images. It builds a 3D model with the correlated features. It then creates a 3D point cloud with colour information of the scanned object. There are different implementations of the structure from motion method that use different approaches to solve the feature-correlation problem between the images from the data set, different methods for detecting the features and different alternatives for sparse reconstruction and dense reconstruction as well. These differences influence variations in the final output across distinct algorithms. This thesis benchmarked these different algorithms in accuracy and processing time. For this purpose, a terrestrial 3D laser scanner was used to scan structures and buildings to generate a ground truth reference to which the structure from motion algorithms were compared. Then a video feed from a drone with a built-in camera was captured when flying around the structure or building to generate the input for the structure from motion algorithms. Different structures are considered taking into account how rich or poor in features they are, since this impacts the result of the structure from motion algorithms. The structure from motion algorithms generated 3D point clouds, which then are analysed with a tool like CloudCompare to benchmark how similar it is to the laser scanner generated data, and the runtime was recorded for comparing it across all algorithms. Subjective analysis has also been made, such as how easy to use the algorithm is and how complete the produced model looks in comparison to the others. In the comparison it was found that there is no absolute best algorithm, since every algorithm highlights in different aspects. There are algorithms that are able to generate a model very fast, managing to scale the execution time linearly in function of the size of their input, but at the expense of accuracy. There are also algorithms that take a long time for dense reconstruction, but generate almost complete models even in the presence of featureless surfaces, like COLMAP modified PatchMacht algorithm. The structure from motion methods are able to generate models with an accuracy of up to \unit[3]{cm} when scanning a simple building, where Visual Structure from Motion and Open Multi-View Environment ranked among the most accurate. It is worth highlighting that the error in accuracy grows as the complexity of the scene increases. Finally, it was found that the structure from motion method cannot reconstruct correctly structures with reflective surfaces, as well as repetitive patterns when the images are taken from mid to close range, as the produced errors can be as high as \unit[1]{m} on a large structure.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)