University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Abstract: This master thesis presents the process of designing and implementing a CNN-based architecture for image recognition included in a larger project in the field of fashion recommendation with deep learning. Concretely, the presented network aims to perform localization and segmentation tasks. Therefore, an accurate analysis of the most well-known localization and segmentation networks in the state of the art has been performed. Afterwards, a multi-task network performing RoI pixel-wise segmentation has been created. This proposal solves the detected weaknesses of the pre-existing networks in the field of application, i.e. fashion recommendation. These weaknesses are basically related with the lack of a fine-grained quality of the segmentation and problems with computational efficiency. When it comes to improve the details of the segmentation, this network proposes to work pixel- wise, i.e. performing a classification task for each of the pixels of the image. Thus, the network is more suitable to detect all the details presented in the analysed images. However, a pixel-wise task requires working in pixel resolution, which implies that the number of operations to perform is usually large. To reduce the total number of operations to perform in the network and increase the computational efficiency, this pixel-wise segmentation is only done in the meaningful regions of the image (Regions of Interest), which are also computed in the network (RoI masks). Then, after a study of the more recent deep learning libraries, the network has been successfully implemented. Finally, to prove the correct operation of the design, a set of experiments have been satisfactorily conducted. In this sense, it must be noted that the evaluation of the results obtained during testing phase with respect to the most well-known architectures is out of the scope of this thesis as the experimental conditions, especially in terms of dataset, have not been suitable for doing so. Nevertheless, the proposed network is totally prepared to perform this evaluation in the future, when the required experimental conditions are available.

