Evaluation of the CNN Based Architectures on the Problem of Wide Baseline Stereo Matching

University essay from KTH/Datorseende och robotik, CVAP

Abstract: Three-dimensional information is often used in robotics and 3D-mapping. There exist several ways to obtain a three-dimensional map. However, the time of flight used in the laser scanners or the structured light utilized by Kinect-like sensors sometimes are not sufficient. In this thesis, we investigate two CNN based stereo matching methods for obtaining 3D-information from a grayscaled pair of rectified images.While the state-of-the-art stereo matching method utilize a Siamese architecture, in this project a two-channel and a two stream network are trained in an attempt to outperform the state-of-the-art. A set of experiments were performed to achieve optimal hyperparameters. By changing one parameter at the time, the networks with architectures mentioned above are trained. After a completed training the networks are evaluated with two criteria, the error rate, and the runtime.Due to time limitations, we were not able to find optimal learning parameters. However, by using settings from [17] we train a two-channel network that performed almost on the same level as the state-of-the-art. The error rate on the test data for our best architecture is 2.64% while the error rate for the state-of-the-art Siamese network is 2.62%. We were not able to achieve better performance than the state-of-the-art, but we believe that it is possible to reduce the error rate further. On the other hand, the state-of-the-art Siamese stereo matching network is more efficient and faster during the disparity estimation. Therefore, if the time efficiency is prioritized, the Siamese based network should be considered.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)