Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation

University essay from Uppsala universitet/Avdelningen för systemteknik

Abstract: Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)