On the effect of architecture on deep learning based features for homography estimation
Abstract: Keypoint detection and description is the first step of homography and essential matrix estimation, which in turn is used in Visual Odometry and Visual SLAM. This work explores the effect (in terms of speed and accuracy) of using different deep learning architectures for such keypoints. The fully convolutional networks — with heads for both the detector and descriptor — are trained through an existing self-supervised method, where correspondences are obtained through known randomly sampled homographies. A new strategy for choosing negative correspondences for the descriptor loss is presented, which enables more flexibility in the architecture design. The new strategy turns out to be essential as it enables networks that outperform the learnt baseline at no cost in inference time. Varying the model size leads to a trade-off in speed and accuracy, and while all models outperform ORB in homography estimation, only the larger models approach SIFT’s performance; performing about 1-7% worse. Training for longer and with additional types of data might give the push needed to outperform SIFT. While the smallest models are 3× faster and use 50× fewer parameters than the learnt baseline, they still require 3× as much time as SIFT while performing about 10-30% worse. However, there is still room for improvement through optimization methods that go beyond architecture modification, e.g. quantization, which might make the method faster than SIFT.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)