Research and Application of 6D Pose Estimation for Mobile 3D Cameras

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: This work addresses the deep-learning-based 6 Degree-of-Freedom (DoF) pose estimation utilizing 3D cameras on an iPhone 13 Pro. The task of pose estimation is to estimate the spatial rotation and translation of an object given its 2D or 3D images. During the pose estimation network training process, a common way to expand the training dataset is to generate synthetic images, which requires the 3D mesh of the target object. Although several famous datasets provide the 3D object files, it is still a problem when one wants to generate a customized real-world object. The typical 3D scanners are mainly designed for industrial usage and are usually expensive. We investigated in this project whether the 3D cameras on Apple devices can replace the industrial 3D scanners in the pose estimation pipeline and what might influence the results during scanning. During the data synthesis, we introduced a pose sampling method to equally sample on a sphere. Random transformation and background images from the SUN2012 dataset are applied, and the synthetic image is rendered through Blender. We picked five testing objects with different sizes and surfaces. Each object is scanned both by front TrueDepth camera and rear Light Detection and Ranging (LiDAR) camera with the ‘3d Scanner App’ on iOS. The network we used is based on PVNet, which uses a pixel-wise voting scheme to find 2D keypoints on RGB images and utilizes uncertainty-driven Perspective-n-Point (PnP) to compute the pose. We achieved both quantitative and qualitative results for each instance. i) TrueDepth camera outperforms Light Detection and Ranging (LiDAR) camera in most scenarios, ii) when an object has less reflective surface and high-contrast texture, the advantage of TrueDepth is more obvious. We also picked three baseline objects from Linemod dataset. Although the average accuracy is lower than the original paper, the performance of our baseline instances shows a similar trend to the original paper’s results. In conclusion, we proved that the 3D cameras on iPhone are capable of the pose estimation pipeline.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)