Meta-Pseudo Labelled Multi-View 3D Shape Recognition

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: The field of computer vision has long pursued the challenge of understanding the three-dimensional world. This endeavour is further fuelled by the increasing demand for technologies that rely on accurate perception of the 3D environment such as autonomous driving and augmented reality. However, the labelled data scarcity in the 3D domain continues to be a hindrance to extensive research and development. Semi-Supervised Learning is a valuable tool to overcome data scarcity yet most of the state-of-art methods are primarily developed and tested for two-dimensional vision problems. To address this challenge, there is a need to explore innovative approaches that can bridge the gap between 2D and 3D domains. In this work, we propose a technique that both leverages the existing abundance of two-dimensional data and makes the state-of-art semi-supervised learning methods directly applicable to 3D tasks. Multi-View Meta Pseudo Labelling (MV-MPL) combines one of the best-performing architectures in 3D shape recognition, Multi-View Convolutional Neural Networks, together with the state-of-art semi-supervised method, Meta Pseudo Labelling. To evaluate the performance of MV-MPL, comprehensive experiments are conducted on widely used shape recognition benchmarks ModelNet40, ShapeNetCore-v1, and ShapeNetCore-v2, as well as, Objaverse-LVIS. The results demonstrate that MV-MPL achieves competitive accuracy compared to fully supervised models, even when only \(10%\) of the labels are available. Furthermore, the study reveals that the object descriptors extracted from the MV-MPL model exhibit strong performance on shape retrieval tasks, indicating the effectiveness of the approach beyond classification objectives. Further analysis includes the evaluation of MV-MPL under more restrained scenarios, the enhancements to the view aggregation and pseudo-labelling processes; and the exploration of the potential of employing multi-views as augmentations for semi-supervised learning.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)